Reputation: 2790
I want to apply LDA on a set of documents. It's suppose to compute the probability that a document belongs to a certain topic. I did the following:
tfidf_vectorizer = TfidfVectorizer(min_df=12, analyzer="word")
tfidf = tfidf_vectorizer.fit_transform(data_samples)
lda = LatentDirichletAllocation(n_topics=5, max_iter=5,
learning_method='online',
learning_offset=50.,
random_state=0)
lda.fit(tfidf)
Now I would like to get the probability of a document in my data_sample
to belong to a given topic for example, since I used 5 topics: [0.2, 0.1 ,0.1, 0.1, 0.5]
, the documentation concerning LDA is prety weak, do you know if this information is easily accesible?
Question: I have the same question, did anyone figure this out? I don't know have it doesn't let me add a comment here, but it let me add to someone else's post.
Upvotes: 1
Views: 1190
Reputation: 166
I had the same issue recently. You can apply your model to each sample using: lda.transform(tfidf)
Note that you need to use the vector tfidf for that.
I think the name "transform" comes from the statistical concept of data transformation
Upvotes: 3