Reputation: 185
I have te same error as this thread : ValueError: cannot compute LDA over an empty collection (no terms) but the solution needed isn't the same.
I'm working on a notebook with Sklearn, and I've done an LDA and a NMF.
I'm now trying to do the same using Gensim: https://radimrehurek.com/gensim/auto_examples/tutorials/run_lda.htm
Here is a piece of code (in Python) from my notebook of what I'm trying to do :
dic = gensim.corpora.Dictionary(texts_lem)
dic.filter_extremes(no_below=10, no_above=0.8)
corpus = [dic.doc2bow(doc) for doc in texts_lem]
model = gensim.models.LdaModel(
corpus=corpus,
id2word=dic.id2token,
num_topics=10,
)
I'm using the existing texts_lem list from another section of my notebook to do the Gensim LDA. I'm following the guide : Creating a dictionary, filtering extremes, creating a corpus and sending it to LdaModel().
Unfortunately, it doesn't work, and commenting filter_extremes's row doesn't help (This is the answer of the other thread with same error).
texts_lem is the list of list of words like the following :
[
['word', 'word', 'word', 'word'],
['word', 'word', 'word', 'word'],
['word', 'word', 'word', 'word'],
]
My error is :
ValueError: cannot compute LDA over an empty collection (no terms)
Many thanks for your help.
Upvotes: 0
Views: 1306
Reputation: 61
As shown in the gensim LDA tutorial, you need to "load" the dictionary before passing dictionary.id2token
to the LdaModel
. Using your example, the code should be
dic = gensim.corpora.Dictionary(texts_lem)
dic.filter_extremes(no_below=10, no_above=0.8)
corpus = [dic.doc2bow(doc) for doc in texts_lem]
# Make a index to word dictionary.
temp = dic[0] # This is only to "load" the dictionary.
id2word = dic.id2token
model = gensim.models.LdaModel(
corpus=corpus,
id2word=id2word,
num_topics=10,
)
This is because id2token
is initialized in a lazy manner to save memory (not created until needed). You can refer to the documentation here.
Upvotes: 2
Reputation: 185
Just don't use id2token.
Your model should be :
model = gensim.models.LdaModel(
corpus=corpus,
id2word=dic.id2token,
num_topics=10,
)
Works fine. Who knows what's going on ?
Upvotes: 0