MrStewart
MrStewart

Reputation: 81

Python NLTK FreqDist - Listing words with a frequency greater than 1000

I'm trying to output every word that appears in my tokens more than 1000 times (> 1000) and save it to freq1000.

freq1000 = []

newtokens = []

for words in tokens:
    newtokens += words
FreqDist(newtokens)

fd_1 = FreqDist(newtokens)

for i in set(fd_1):
    if fd_1.count(i) == >1000:
        print(i)

This is my current code, I'm completly stuck after this and I'm not sure if there is a freqdist function I can use to help. I have saved the FreqDist to fd_1 successfully. I'm just unsure how to get an output of the words that appear more than 1000 times and save it to freq1000.

I would appreciate any help you can provide.

Upvotes: 2

Views: 1850

Answers (1)

Arun AK
Arun AK

Reputation: 4370

You can filter the words based on the frequency count using the freqDist.items() like below:

list(filter(lambda x: x[1]>=1000, fd_1.items()))

Hope it helps :)

Upvotes: 1

Related Questions