Reputation: 115
I am unable to understand how to print the output for the below code
# make gensim dictionary and corpus
dictionary = gensim.corpora.Dictionary(boc_texts)
corpus = [dictionary.doc2bow(boc_text) for boc_text in boc_texts]
tfidf = gensim.models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
I want to print the keyphrases and their tfidf scores
Thank you
Upvotes: 2
Views: 2248
Reputation: 154
I was working with the same code found on a blog post and had the same problem as you.
Here is the entire code: https://gist.github.com/bbengfort/efb311aaa1b52814c284d3b21ae752d6
Basically you just need to add
if __name__ == '__main__':
tfidfs, id2word = score_keyphrases_by_tfidf(texts)
fileids = texts.fileids()
# Print top keywords by TF-IDF
for idx, doc in enumerate(tfidfs):
print("Document '{}' key phrases:".format(fileids[idx]))
# Get top 20 terms by TF-IDF score
for wid, score in heapq.nlargest(20, doc, key=itemgetter(1)):
print("{:0.3f}: {}".format(score, id2word[wid]))
print("")
Upvotes: 3