How to abstract bigram topics instead of unigrams using Latent Dirichlet Allocation (LDA) in python- gensim?

Question

LDA Original Output

Uni-grams
- topic1 -scuba,water,vapor,diving
- topic2 -dioxide,plants,green,carbon

Bi-gram topics
- topic1 -scuba diving,water vapor
- topic2 -green plants,carbon dioxide

Any idea?

Thomas N T · Accepted Answer

You can use word2vec to get most similar terms from the top n topics abstracted using LDA.

LDA Output

Create a dictionary of bi-grams using topics abstracted (for ex:-san_francisco)

Then, do word2vec to get most similar words (uni-grams,bi-grams etc)

Word and Cosine distance

los_angeles (0.666175)
golden_gate (0.571522)
oakland (0.557521)

check https://code.google.com/p/word2vec/ (From words to phrases and beyond)