Arman
Arman

Reputation: 907

How to implement Latent Dirichlet Allocation to give bigrams/trigrams in topics instead of unigrams

I used the gensim LDAModel for topic extraction for customer reviews as follows:

dictionary = corpora.Dictionary(clean_reviews)
dictionary.filter_extremes(keep_n=11000) #change filters
dictionary.compactify()
dictionary_path = "dictionary.dict"
corpora.Dictionary.save(dictionary, dictionary_path)

# convert tokenized documents to vectors

corpus = [dictionary.doc2bow(doc) for doc in clean_reviews]
vocab = lda.datasets.load_reuters_vocab()  

# Training lda using number of topics set = 10 (which can be changed)

lda = gensim.models.LdaModel(corpus, id2word = dictionary,
                        num_topics = 20,
                        passes = 20,
                        random_state=1,
                        alpha = "auto")

This returns unigrams in topics like:

topic1 -delivery,parcel,location

topic2 -app, login, access

But I am looking for ngrams. I came across sklearn's LatentDirichletAllocation which uses Tfidf vectorizer as follows:

vectorizer = TfidfVectorizer(analyzer='word', ngram_range=[2,5], stop_words='english', min_df=2)    
X = vectorizer.fit_transform(new_review_list)
clf = decomposition.LatentDirichletAllocation(n_topics=20, random_state=3, doc_topic_prior = .1).fit(X)

where we can specify range for ngrams in the vectorizer. Is it possible to do so in the gensim LDA Model as well.

Sorry, I'm very new to using all these models, so don't know much about them.

Upvotes: 1

Views: 4203

Answers (1)

Virashree Patel
Virashree Patel

Reputation: 11

I know this an old thread but I thought I will share what I did to get k-grams in topics. I wanted to include bi-grams, tri-grams, and quad-grams in my vocabulary. For this purpose, I used gensim's Phrases class and before running LDA model. Here is a really good resource.

https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/#15visualizethetopicskeywords

I have done something similar. Hope this helps

Upvotes: 1

Related Questions