import pandas as pd

data = pd.read_csv("../archivos/npr.csv").sample(1000)
print(f"Tenemos {data.shape[0]:,d} documentos.")

data.head()

Tenemos 1,000 documentos.


import re 

from nltk.corpus import stopwords
stopwords = stopwords.words('english')


def pre_procesado(texto):
    texto = texto.lower()
    texto = re.sub(r"[\W\d]+", " ", texto)
    texto = [palabra for palabra in texto.split() if palabra not in stopwords]
    return texto 

data['Pre-Processed'] = data['Article'].apply(lambda texto: pre_procesado(texto))

data.head()


import pyLDAvis.gensim_models
from gensim.models import LdaModel 
from gensim.corpora import Dictionary
from pprint import pprint


# Crear una represantación de los documentos en forma de diccionario
dictionary = Dictionary(data['Pre-Processed'].values)

# Filtrar palabras muy frecuentes o infrecuentes
dictionary.filter_extremes(no_below=5, no_above=0.5) 

# Corpus
corpus = [dictionary.doc2bow(text) for text in data['Pre-Processed'].values]

# Entrenar el modelo
model = LdaModel(corpus = corpus, id2word = dictionary, num_topics = 7, passes=10)


model.print_topics(num_words=15)

[(0,
  '0.019*"trump" + 0.009*"clinton" + 0.009*"president" + 0.006*"campaign" + 0.006*"obama" + 0.005*"u" + 0.005*"state" + 0.004*"house" + 0.004*"court" + 0.003*"republican" + 0.003*"country" + 0.003*"party" + 0.003*"election" + 0.003*"democrats" + 0.003*"political"'),
 (1,
  '0.004*"police" + 0.004*"music" + 0.004*"even" + 0.003*"back" + 0.003*"year" + 0.003*"get" + 0.003*"world" + 0.003*"way" + 0.003*"still" + 0.003*"much" + 0.003*"album" + 0.002*"song" + 0.002*"every" + 0.002*"know" + 0.002*"made"'),
 (2,
  '0.004*"students" + 0.004*"know" + 0.004*"really" + 0.004*"get" + 0.004*"think" + 0.004*"make" + 0.004*"percent" + 0.004*"even" + 0.004*"school" + 0.004*"food" + 0.004*"study" + 0.004*"world" + 0.003*"university" + 0.003*"lot" + 0.003*"things"'),
 (3,
  '0.009*"u" + 0.007*"trump" + 0.007*"comey" + 0.006*"president" + 0.005*"fbi" + 0.004*"investigation" + 0.004*"told" + 0.004*"sessions" + 0.004*"police" + 0.003*"government" + 0.003*"zika" + 0.003*"two" + 0.003*"last" + 0.003*"russia" + 0.003*"department"'),
 (4,
  '0.004*"news" + 0.004*"u" + 0.003*"team" + 0.003*"state" + 0.003*"reports" + 0.003*"ailes" + 0.003*"last" + 0.003*"city" + 0.003*"reported" + 0.003*"fox" + 0.003*"world" + 0.003*"according" + 0.003*"climate" + 0.003*"trump" + 0.003*"told"'),
 (5,
  '0.006*"women" + 0.005*"think" + 0.004*"way" + 0.004*"get" + 0.004*"know" + 0.003*"two" + 0.003*"going" + 0.003*"really" + 0.003*"even" + 0.003*"life" + 0.003*"much" + 0.003*"things" + 0.003*"family" + 0.003*"world" + 0.003*"men"'),
 (6,
  '0.010*"health" + 0.006*"care" + 0.005*"tax" + 0.005*"patients" + 0.005*"law" + 0.004*"year" + 0.004*"u" + 0.004*"percent" + 0.004*"insurance" + 0.003*"company" + 0.003*"state" + 0.003*"get" + 0.003*"federal" + 0.003*"may" + 0.003*"million"')]


lda_display = pyLDAvis.gensim_models.prepare(model, corpus, dictionary, sort_topics=True)
pyLDAvis.display(lda_display)


d = dictionary.doc2bow(["trump", "clinton", "washington"])
topics = [(cluster[0]+1, cluster[1]) for cluster in model.get_document_topics(d)]
topics

[(1, 0.7854711),
 (2, 0.03572736),
 (3, 0.035719372),
 (4, 0.03582063),
 (5, 0.035766896),
 (6, 0.03575087),
 (7, 0.035743773)]


def get_doc_top_n(text_processed, n):
    d = dictionary.doc2bow(text_processed)
    topics = dict(model.get_document_topics(d))
    try:
        return topics[n]
    except:
        return None


for t in range(0,7):
    top_name = f"topic_{t}"
    data[top_name] = data['Pre-Processed'].apply(lambda doc: get_doc_top_n(doc, t))


data


for t in range(0,7):
    print(f"*********************************** TOPIC {t} ***********************************")
    topic = f"topic_{t}"
    for i,articulo in enumerate(data.sort_values('topic_0', ascending=False)['Article'].values[:5]):
        print(f"Artículo #{i}")
        print(articulo[:500])
        print()
    print()

*********************************** TOPIC 0 ***********************************
Artículo #0
The political revolution that Bernie Sanders began may still be felt at the ballot box this November even if he’s not the Democratic nominee for president. The Vermont senator is beginning to expand his political network by helping upstart progressive congressional candidates and state legislators, lending his fundraising prowess and national fame to boost their bids. And win or lose for the White House hopeful, Sanders’s candidacy has given them a prominent national messenger and new energy the

Artículo #1
As both parties struggle with unity this election, more   endorsements seem to be coming every day. Several prominent Republicans announced this week that they plan to vote for Hillary Clinton and at least one   Democrat has backed Donald Trump. Crossing over isn’t new  —   there have been Obama Republicans, Reagan Democrats and a number of other defectors across the years. Here’s a list of some notable Republicans and Democrats who have endorsed the other party’s candidate this election and in 

Artículo #2
Sometimes it pays to have a boring day job. Even those who oppose Supreme Court nominee Merrick Garland concede that getting people’s blood boiling over his record is difficult. That’s in part because of the court he has served on for 19 years. Three of the current Supreme Court justices came from the same court where Garland now sits as chief judge  —   the U. S. Court of Appeals for the District of Columbia. ”We think of it as the second most important court in the land, but in fact it is the 

Artículo #3
Why would Russian President Vladimir Putin want to help Donald Trump win the White House? That’s the accusation from Democrats this week, after embarrassing internal Democratic National Committee emails appeared on Wikileaks on the eve of the party’s convention in Philadelphia. The emails were lifted earlier this year in a hacking breach that security experts have linked to Russian espionage groups. As part of their pushback against the emails’ damning details, many Democrats accuse Putin of try

Artículo #4
Updated at 1:49 p. m. ET Saturday with confirmation from the U. S. official and comments from Sen. Ron Wyden, Updated at 3:20 p. m. ET Saturday with comments from Sen. Angus King, The CIA has concluded that Russia intervened in the 2016 election specifically to help Donald Trump win the presidency, a U. S. official has confirmed to NPR. ”Before, there was confidence about the fact that Russia interfered,” the official says. ”But there was low confidence on what the direction and intentionality o


*********************************** TOPIC 1 ***********************************
Artículo #0
The political revolution that Bernie Sanders began may still be felt at the ballot box this November even if he’s not the Democratic nominee for president. The Vermont senator is beginning to expand his political network by helping upstart progressive congressional candidates and state legislators, lending his fundraising prowess and national fame to boost their bids. And win or lose for the White House hopeful, Sanders’s candidacy has given them a prominent national messenger and new energy the

Artículo #1
As both parties struggle with unity this election, more   endorsements seem to be coming every day. Several prominent Republicans announced this week that they plan to vote for Hillary Clinton and at least one   Democrat has backed Donald Trump. Crossing over isn’t new  —   there have been Obama Republicans, Reagan Democrats and a number of other defectors across the years. Here’s a list of some notable Republicans and Democrats who have endorsed the other party’s candidate this election and in 

Artículo #2
Sometimes it pays to have a boring day job. Even those who oppose Supreme Court nominee Merrick Garland concede that getting people’s blood boiling over his record is difficult. That’s in part because of the court he has served on for 19 years. Three of the current Supreme Court justices came from the same court where Garland now sits as chief judge  —   the U. S. Court of Appeals for the District of Columbia. ”We think of it as the second most important court in the land, but in fact it is the 

Artículo #3
Why would Russian President Vladimir Putin want to help Donald Trump win the White House? That’s the accusation from Democrats this week, after embarrassing internal Democratic National Committee emails appeared on Wikileaks on the eve of the party’s convention in Philadelphia. The emails were lifted earlier this year in a hacking breach that security experts have linked to Russian espionage groups. As part of their pushback against the emails’ damning details, many Democrats accuse Putin of try

Artículo #4
Updated at 1:49 p. m. ET Saturday with confirmation from the U. S. official and comments from Sen. Ron Wyden, Updated at 3:20 p. m. ET Saturday with comments from Sen. Angus King, The CIA has concluded that Russia intervened in the 2016 election specifically to help Donald Trump win the presidency, a U. S. official has confirmed to NPR. ”Before, there was confidence about the fact that Russia interfered,” the official says. ”But there was low confidence on what the direction and intentionality o


*********************************** TOPIC 2 ***********************************
Artículo #0
The political revolution that Bernie Sanders began may still be felt at the ballot box this November even if he’s not the Democratic nominee for president. The Vermont senator is beginning to expand his political network by helping upstart progressive congressional candidates and state legislators, lending his fundraising prowess and national fame to boost their bids. And win or lose for the White House hopeful, Sanders’s candidacy has given them a prominent national messenger and new energy the

Artículo #1
As both parties struggle with unity this election, more   endorsements seem to be coming every day. Several prominent Republicans announced this week that they plan to vote for Hillary Clinton and at least one   Democrat has backed Donald Trump. Crossing over isn’t new  —   there have been Obama Republicans, Reagan Democrats and a number of other defectors across the years. Here’s a list of some notable Republicans and Democrats who have endorsed the other party’s candidate this election and in 

Artículo #2
Sometimes it pays to have a boring day job. Even those who oppose Supreme Court nominee Merrick Garland concede that getting people’s blood boiling over his record is difficult. That’s in part because of the court he has served on for 19 years. Three of the current Supreme Court justices came from the same court where Garland now sits as chief judge  —   the U. S. Court of Appeals for the District of Columbia. ”We think of it as the second most important court in the land, but in fact it is the 

Artículo #3
Why would Russian President Vladimir Putin want to help Donald Trump win the White House? That’s the accusation from Democrats this week, after embarrassing internal Democratic National Committee emails appeared on Wikileaks on the eve of the party’s convention in Philadelphia. The emails were lifted earlier this year in a hacking breach that security experts have linked to Russian espionage groups. As part of their pushback against the emails’ damning details, many Democrats accuse Putin of try

Artículo #4
Updated at 1:49 p. m. ET Saturday with confirmation from the U. S. official and comments from Sen. Ron Wyden, Updated at 3:20 p. m. ET Saturday with comments from Sen. Angus King, The CIA has concluded that Russia intervened in the 2016 election specifically to help Donald Trump win the presidency, a U. S. official has confirmed to NPR. ”Before, there was confidence about the fact that Russia interfered,” the official says. ”But there was low confidence on what the direction and intentionality o


*********************************** TOPIC 3 ***********************************
Artículo #0
The political revolution that Bernie Sanders began may still be felt at the ballot box this November even if he’s not the Democratic nominee for president. The Vermont senator is beginning to expand his political network by helping upstart progressive congressional candidates and state legislators, lending his fundraising prowess and national fame to boost their bids. And win or lose for the White House hopeful, Sanders’s candidacy has given them a prominent national messenger and new energy the

Artículo #1
As both parties struggle with unity this election, more   endorsements seem to be coming every day. Several prominent Republicans announced this week that they plan to vote for Hillary Clinton and at least one   Democrat has backed Donald Trump. Crossing over isn’t new  —   there have been Obama Republicans, Reagan Democrats and a number of other defectors across the years. Here’s a list of some notable Republicans and Democrats who have endorsed the other party’s candidate this election and in 

Artículo #2
Sometimes it pays to have a boring day job. Even those who oppose Supreme Court nominee Merrick Garland concede that getting people’s blood boiling over his record is difficult. That’s in part because of the court he has served on for 19 years. Three of the current Supreme Court justices came from the same court where Garland now sits as chief judge  —   the U. S. Court of Appeals for the District of Columbia. ”We think of it as the second most important court in the land, but in fact it is the 

Artículo #3
Why would Russian President Vladimir Putin want to help Donald Trump win the White House? That’s the accusation from Democrats this week, after embarrassing internal Democratic National Committee emails appeared on Wikileaks on the eve of the party’s convention in Philadelphia. The emails were lifted earlier this year in a hacking breach that security experts have linked to Russian espionage groups. As part of their pushback against the emails’ damning details, many Democrats accuse Putin of try

Artículo #4
Updated at 1:49 p. m. ET Saturday with confirmation from the U. S. official and comments from Sen. Ron Wyden, Updated at 3:20 p. m. ET Saturday with comments from Sen. Angus King, The CIA has concluded that Russia intervened in the 2016 election specifically to help Donald Trump win the presidency, a U. S. official has confirmed to NPR. ”Before, there was confidence about the fact that Russia interfered,” the official says. ”But there was low confidence on what the direction and intentionality o


*********************************** TOPIC 4 ***********************************
Artículo #0
The political revolution that Bernie Sanders began may still be felt at the ballot box this November even if he’s not the Democratic nominee for president. The Vermont senator is beginning to expand his political network by helping upstart progressive congressional candidates and state legislators, lending his fundraising prowess and national fame to boost their bids. And win or lose for the White House hopeful, Sanders’s candidacy has given them a prominent national messenger and new energy the

Artículo #1
As both parties struggle with unity this election, more   endorsements seem to be coming every day. Several prominent Republicans announced this week that they plan to vote for Hillary Clinton and at least one   Democrat has backed Donald Trump. Crossing over isn’t new  —   there have been Obama Republicans, Reagan Democrats and a number of other defectors across the years. Here’s a list of some notable Republicans and Democrats who have endorsed the other party’s candidate this election and in 

Artículo #2
Sometimes it pays to have a boring day job. Even those who oppose Supreme Court nominee Merrick Garland concede that getting people’s blood boiling over his record is difficult. That’s in part because of the court he has served on for 19 years. Three of the current Supreme Court justices came from the same court where Garland now sits as chief judge  —   the U. S. Court of Appeals for the District of Columbia. ”We think of it as the second most important court in the land, but in fact it is the 

Artículo #3
Why would Russian President Vladimir Putin want to help Donald Trump win the White House? That’s the accusation from Democrats this week, after embarrassing internal Democratic National Committee emails appeared on Wikileaks on the eve of the party’s convention in Philadelphia. The emails were lifted earlier this year in a hacking breach that security experts have linked to Russian espionage groups. As part of their pushback against the emails’ damning details, many Democrats accuse Putin of try

Artículo #4
Updated at 1:49 p. m. ET Saturday with confirmation from the U. S. official and comments from Sen. Ron Wyden, Updated at 3:20 p. m. ET Saturday with comments from Sen. Angus King, The CIA has concluded that Russia intervened in the 2016 election specifically to help Donald Trump win the presidency, a U. S. official has confirmed to NPR. ”Before, there was confidence about the fact that Russia interfered,” the official says. ”But there was low confidence on what the direction and intentionality o


*********************************** TOPIC 5 ***********************************
Artículo #0
The political revolution that Bernie Sanders began may still be felt at the ballot box this November even if he’s not the Democratic nominee for president. The Vermont senator is beginning to expand his political network by helping upstart progressive congressional candidates and state legislators, lending his fundraising prowess and national fame to boost their bids. And win or lose for the White House hopeful, Sanders’s candidacy has given them a prominent national messenger and new energy the

Artículo #1
As both parties struggle with unity this election, more   endorsements seem to be coming every day. Several prominent Republicans announced this week that they plan to vote for Hillary Clinton and at least one   Democrat has backed Donald Trump. Crossing over isn’t new  —   there have been Obama Republicans, Reagan Democrats and a number of other defectors across the years. Here’s a list of some notable Republicans and Democrats who have endorsed the other party’s candidate this election and in 

Artículo #2
Sometimes it pays to have a boring day job. Even those who oppose Supreme Court nominee Merrick Garland concede that getting people’s blood boiling over his record is difficult. That’s in part because of the court he has served on for 19 years. Three of the current Supreme Court justices came from the same court where Garland now sits as chief judge  —   the U. S. Court of Appeals for the District of Columbia. ”We think of it as the second most important court in the land, but in fact it is the 

Artículo #3
Why would Russian President Vladimir Putin want to help Donald Trump win the White House? That’s the accusation from Democrats this week, after embarrassing internal Democratic National Committee emails appeared on Wikileaks on the eve of the party’s convention in Philadelphia. The emails were lifted earlier this year in a hacking breach that security experts have linked to Russian espionage groups. As part of their pushback against the emails’ damning details, many Democrats accuse Putin of try

Artículo #4
Updated at 1:49 p. m. ET Saturday with confirmation from the U. S. official and comments from Sen. Ron Wyden, Updated at 3:20 p. m. ET Saturday with comments from Sen. Angus King, The CIA has concluded that Russia intervened in the 2016 election specifically to help Donald Trump win the presidency, a U. S. official has confirmed to NPR. ”Before, there was confidence about the fact that Russia interfered,” the official says. ”But there was low confidence on what the direction and intentionality o


*********************************** TOPIC 6 ***********************************
Artículo #0
The political revolution that Bernie Sanders began may still be felt at the ballot box this November even if he’s not the Democratic nominee for president. The Vermont senator is beginning to expand his political network by helping upstart progressive congressional candidates and state legislators, lending his fundraising prowess and national fame to boost their bids. And win or lose for the White House hopeful, Sanders’s candidacy has given them a prominent national messenger and new energy the

Artículo #1
As both parties struggle with unity this election, more   endorsements seem to be coming every day. Several prominent Republicans announced this week that they plan to vote for Hillary Clinton and at least one   Democrat has backed Donald Trump. Crossing over isn’t new  —   there have been Obama Republicans, Reagan Democrats and a number of other defectors across the years. Here’s a list of some notable Republicans and Democrats who have endorsed the other party’s candidate this election and in 

Artículo #2
Sometimes it pays to have a boring day job. Even those who oppose Supreme Court nominee Merrick Garland concede that getting people’s blood boiling over his record is difficult. That’s in part because of the court he has served on for 19 years. Three of the current Supreme Court justices came from the same court where Garland now sits as chief judge  —   the U. S. Court of Appeals for the District of Columbia. ”We think of it as the second most important court in the land, but in fact it is the 

Artículo #3
Why would Russian President Vladimir Putin want to help Donald Trump win the White House? That’s the accusation from Democrats this week, after embarrassing internal Democratic National Committee emails appeared on Wikileaks on the eve of the party’s convention in Philadelphia. The emails were lifted earlier this year in a hacking breach that security experts have linked to Russian espionage groups. As part of their pushback against the emails’ damning details, many Democrats accuse Putin of try

Artículo #4
Updated at 1:49 p. m. ET Saturday with confirmation from the U. S. official and comments from Sen. Ron Wyden, Updated at 3:20 p. m. ET Saturday with comments from Sen. Angus King, The CIA has concluded that Russia intervened in the 2016 election specifically to help Donald Trump win the presidency, a U. S. official has confirmed to NPR. ”Before, there was confidence about the fact that Russia interfered,” the official says. ”But there was low confidence on what the direction and intentionality o

	Article
4219	Chinese artist Ai Weiwei has had several confr...
11124	Editor’s note: Updated Nov. 21 at 11:15 a. m. ...
9971	Saying that Alabama Chief Justice Roy Moore vi...
487	There’s no room for ambivalence when you perfo...
73	Every year in the U. S. more than 30, 000 peop...

	Article	Pre-Processed
4219	Chinese artist Ai Weiwei has had several confr...	[chinese, artist, ai, weiwei, several, confron...
11124	Editor’s note: Updated Nov. 21 at 11:15 a. m. ...	[editor, note, updated, nov, comment, nissan, ...
9971	Saying that Alabama Chief Justice Roy Moore vi...	[saying, alabama, chief, justice, roy, moore, ...
487	There’s no room for ambivalence when you perfo...	[room, ambivalence, perform, bob, dylan, maste...
73	Every year in the U. S. more than 30, 000 peop...	[every, year, u, people, die, things, related,...

	Article	Pre-Processed	topic_0	topic_1	topic_2	topic_3	topic_4	topic_5	topic_6
4219	Chinese artist Ai Weiwei has had several confr...	[chinese, artist, ai, weiwei, several, confron...	0.219641	0.769151	NaN	NaN	NaN	NaN	NaN
11124	Editor’s note: Updated Nov. 21 at 11:15 a. m. ...	[editor, note, updated, nov, comment, nissan, ...	NaN	NaN	0.880487	NaN	0.066320	NaN	0.051788
9971	Saying that Alabama Chief Justice Roy Moore vi...	[saying, alabama, chief, justice, roy, moore, ...	0.995678	NaN	NaN	NaN	NaN	NaN	NaN
487	There’s no room for ambivalence when you perfo...	[room, ambivalence, perform, bob, dylan, maste...	NaN	0.945222	NaN	0.051077	NaN	NaN	NaN
73	Every year in the U. S. more than 30, 000 peop...	[every, year, u, people, die, things, related,...	NaN	NaN	NaN	NaN	0.117448	NaN	0.879797
...	...	...	...	...	...	...	...	...	...
566	President Trump’s inner circle got one more me...	[president, trump, inner, circle, got, one, me...	0.997309	NaN	NaN	NaN	NaN	NaN	NaN
8664	The U. S. State Department is dismissing a new...	[u, state, department, dismissing, newspaper, ...	0.722618	NaN	NaN	NaN	NaN	NaN	0.275523
1084	Here’s one side of the resume of the CIA’s new...	[one, side, resume, cia, new, gina, haspel, de...	0.973770	NaN	NaN	0.024513	NaN	NaN	NaN
11879	Shirley Jackson was a fairly famous writer in ...	[shirley, jackson, fairly, famous, writer, sho...	NaN	NaN	NaN	NaN	NaN	0.996126	NaN
9157	Lace front, true believers! RuPaul’s Drag Race...	[lace, front, true, believers, rupaul, drag, r...	NaN	0.159549	NaN	NaN	NaN	0.839049	NaN

Modelado de temas¶

NLP - Analítica Estratégica de Datos¶

Retroalimentación taller 8¶

🚵‍♀️ La próxima clase veremos visualizaciones para NLP¶

⌛ En la clase anterior¶

Hoy continuamos con los modelos de agrupación¶

🚀 Hoy veremos...¶

🤖 Modelado de temas¶

🤖 Modelado de temas¶

LDA - Asignación Latente de Dirichlet¶

¿Qué hace LDA?¶

Usando modelado de temas obtendriamos un resultado del tipo:¶

¿Qué es LDA?¶

¿Qué es LDA?¶

¿Qué es LDA?¶

¿Qué es LDA?¶

¿Qué es LDA?¶

¿Qué es LDA?¶

¿Qué es LDA?¶

(Recorderis rápido) Distribución multinomial¶

Distribución de Dirichlet¶

Distribución de Dirichlet¶

Distribución de Dirichlet¶

Distribución de Dirichlet¶

Distribución de Dirichlet¶

Distribución de Dirichlet¶

👮‍♀️ Punto de control: ¿Cuál distribución es más adecuada para el caso de las noticias?¶

Una distribución de distribuciones¶

Una distribución de distribuciones¶

$N$ temas, $N-1$ dimensiones¶

Dos distribuciones de Dirichlet¶

Dos distribuciones de Dirichlet¶

Proceso iterativo para encontrar la mejor configuración¶

LDA - Asignación Latente de Dirichlet¶

👩‍💻 Manos a la obra¶

👩‍💻 Manos a la obra¶

👩‍💻 Manos a la obra¶

👩‍💻 Manos a la obra¶

👩‍💻 Manos a la obra¶

👩‍💻 Manos a la obra¶

Cosas a considerar:¶

Recapitulando:¶

¡Tiempo de taller!¶

Próxima clase: Visualizaciones para NLP¶