How to get the topics probability of a specific document using scikit learn?

Question

I want to apply LDA on a set of documents. It's suppose to compute the probability that a document belongs to a certain topic. I did the following:

tfidf_vectorizer = TfidfVectorizer(min_df=12, analyzer="word")
tfidf = tfidf_vectorizer.fit_transform(data_samples)
lda = LatentDirichletAllocation(n_topics=5, max_iter=5,
                                learning_method='online',
                                learning_offset=50.,
                                random_state=0)
lda.fit(tfidf)

Now I would like to get the probability of a document in my data_sample to belong to a given topic for example, since I used 5 topics: [0.2, 0.1 ,0.1, 0.1, 0.5], the documentation concerning LDA is prety weak, do you know if this information is easily accesible?

Question: I have the same question, did anyone figure this out? I don't know have it doesn't let me add a comment here, but it let me add to someone else's post.

Hernan C. Vazquez · Accepted Answer

I had the same issue recently. You can apply your model to each sample using: lda.transform(tfidf)

Note that you need to use the vector tfidf for that.

I think the name "transform" comes from the statistical concept of data transformation

How to get the topics probability of a specific document using scikit learn?

Answers (1)

Related Questions