Reputation: 8483
I have this piece of code and was wondering if there was any inbuilt way to do it faster?
Words has a simple tokenized string input.
freq_unigrams = nltk.FreqDist(words)
unigram_list = []
count = 0
for x in freq_unigrams.keys():
unigram_list.append(x)
count+=1
if count >= 1000:
break
Upvotes: 0
Views: 124
Reputation: 20869
This is theoretically more efficient:
import itertools
unigram_list = list(itertools.islice(freq_unigrams.iterkeys(), 1000))
...than working off freq_unigrams.keys()
, because you're only interested in the top 1000, and not the remaining x, which the using freq_unigrams.keys()
will also need to populate in the intermediate list
.
Upvotes: 1
Reputation: 414915
If your intent is to get the top 1000 most frequent words in the words
list you could try:
import collections
# get top words and their frequencies
most_common = collections.Counter(words).most_common(1000)
Upvotes: 1
Reputation: 74685
I suggest:
unigram_list = freq_unigrams.keys()
unigram_list[:] = unigram_list[:1000]
This would not make the copy that: unigram_list = freq_unigrams.keys()[:1000]
does.
Although this might be better with iterators:
from itertools import islice
unigram_list[:] = islice(freq_unigrams.iterkeys(),1000)
Upvotes: 1
Reputation: 7845
**a little late...
To take the first 1000 keys in your dictionary and assign them to a new list:
unigram_list = freq_unigrams.keys()[:1000]
Upvotes: 0
Reputation: 44674
Does freq_unigrams.keys()
return a list? If so, how about the following:
unigram_list = freq_unigrams.keys()[:1000]
This gives you a list containing the first 1000 elements of freq_unigrams.keys()
, with no looping.
Upvotes: 4