Ivan
Ivan

Reputation: 703

How to generate a topic from a list of titles using LDA (Python)?

I am new to natural language processing. I have a list of blog titles, for example (Not real data, but you get the point):

docs = ["Places to Eat", "Places to Visit", "Top 10 Things to Do in Singapore"]...

There are about 3000 over titles and I want to use LDA in Python to generate topics for each of this title. Assuming that I have already cleaned and tokenised these texts using nltk package and removed the stopwords, I will end up with:

texts = [["places","eat"],["places","visit"]]...

I then proceed to convert these texts into Bag-of-words:

from gensim import corpora, models
dictionary = corpora.Dictionary(texts)

corpus = [dictionary.doc2bow(text) for text in texts]

Corpus data looks like this:

[(0, 1), (1, 1)]...

Model creation:

import gensim
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=30, id2word = dictionary, passes=20)

How do I make use of this model to generate a list of topics - For example "Eat", "Visit", etc. for each of this titles? I understand that the output might contain probabilities but I would like to string them together with only the text.

Upvotes: 2

Views: 2096

Answers (1)

John R
John R

Reputation: 1508

You can retrieve a list of document topics from a gensim LDA with

Ldamodel.show_topics()

and then classify a new document with

Ldamodel.get_document_topics(doc)

where doc is a document bag-of-words vector.

https://radimrehurek.com/gensim/models/ldamodel.html

Upvotes: 3

Related Questions