Reputation: 276
I have a list of bag of words for two classes. Say n items in class A and m items in class B. I want to use the topic modeling with gensim package (for LDA) in python in order to train a model for class A vs class B. Meanwhile I am new to both Topic Modeling and Python. Does anyone know how should I do this? I mean, should I merge all the bags for each class and the use gensim or should I use bag for each item seperately? Thanks!
Upvotes: 1
Views: 2719
Reputation: 440
If I understand you correctly you want to compare documents from two sources.
One way to do this with Gensim would be:
Now you can see topics distributions for each documents and determine how similar two documents are using Gensim's similarity methods.
For details take a look at Gensim's tutorials. The only modification you'd need to make would be to combine your documents from A and B into one bigger document and save the indices somewhere so that you can compare them easily later.
However, depending on your data and your goal, other forms of LDA (such as correlated topics models) may be more suitable.
Upvotes: 1