marin
marin

Reputation: 953

How I can see all documents per topic in LDA?

I am using LDA to know the themes of a great text. I managed to print the topics, but I would like to print each text with your topic.

Data:

it's very hot outside summer
there are not many flowers in winter
in the winter we eat hot food
in the summer we go to the sea
in winter we used many clothes
in summer we are on vacation
winter and summer are two seasons of the year

I tried to use sklearn and I can print the topics, but I would like to print all the phrases belonging to each topic

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import numpy as np
import pandas

dataset = pandas.read_csv('data.csv', encoding = 'utf-8')
comments = dataset['comments']
comments_list = comments.values.tolist()

vect = CountVectorizer()
X = vect.fit_transform(comments_list)

lda = LatentDirichletAllocation(n_topics = 2, learning_method = "batch", max_iter = 25, random_state = 0)

document_topics = lda.fit_transform(X)

sorting = np.argsort(lda.components_, axis = 1)[:, ::-1]
feature_names = np.array(vect.get_feature_names())

docs = np.argsort(comments_list[:, 1])[::-1]
for i in docs[:4]:
    print(' '.join(i) + '\n')

Good output:

Topic 1
it's very hot outside summer
in the summer we go to the sea
in summer we are on vacation
winter and summer are two seasons of the year

Topic 2
there are not many flowers in winter
in the winter we eat hot food
in winter we used many clothes
winter and summer are two seasons of the year

Upvotes: 1

Views: 398

Answers (1)

ladybug
ladybug

Reputation: 602

If you want to print the documents, you need to specify them.

print(" ".join(comments_list[i].split(",")[:2]) + "\n")

Upvotes: 1

Related Questions