Reputation: 703
I am new to natural language processing. I have a list of blog titles, for example (Not real data, but you get the point):
docs = ["Places to Eat", "Places to Visit", "Top 10 Things to Do in Singapore"]...
There are about 3000 over titles and I want to use LDA in Python to generate topics for each of this title. Assuming that I have already cleaned and tokenised these texts using nltk package and removed the stopwords, I will end up with:
texts = [["places","eat"],["places","visit"]]...
I then proceed to convert these texts into Bag-of-words:
from gensim import corpora, models
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
Corpus data looks like this:
[(0, 1), (1, 1)]...
Model creation:
import gensim
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=30, id2word = dictionary, passes=20)
How do I make use of this model to generate a list of topics - For example "Eat", "Visit", etc. for each of this titles? I understand that the output might contain probabilities but I would like to string them together with only the text.
Upvotes: 2
Views: 2096
Reputation: 1508
You can retrieve a list of document topics from a gensim LDA with
Ldamodel.show_topics()
and then classify a new document with
Ldamodel.get_document_topics(doc)
where doc is a document bag-of-words vector.
https://radimrehurek.com/gensim/models/ldamodel.html
Upvotes: 3