Reputation: 3324
I have a function that aims to give me the bottom n percent occurring words from my data. This function is:
def bottomnpercent(table,n):
words=0
wordcounter=Counter()
for key, data in table.scan():
if not key in stopwords:
words+=1
wordcounter[key]+= getsomedata
idx=percentage(n,words)
return Counter(wordcounter.most_common()[-idx:])
(table.scan loops though an HBASE table that has a word and a frequency count; getsomedata does a lookup that returns the count for a particular word).
The problem is this returns a counter of the form:
Counter({('stopped', 173): 1, ('thrilling', 17): 1, ('fluids', 18): 1, ('Pictures', 18): 1, ('steering', 37): 1,...
which is no good as everything occurs 1 time and I need something like:
Counter({('stopped'): 173, ('thrilling'): 17, ('fluids'): 18, ('Pictures'): 18, ('steering'): 37,...
but I cannot figure out how. Any help is much appreciated. TIA!
Upvotes: 0
Views: 736
Reputation: 107287
Its because of that wordcounter
is a counter ( wordcounter=Counter()
) and again you use it inside another counter return Counter(wordcounter.most_common()[-idx:])
! you just need to return the following :
return wordcounter.most_common()[-idx:]
Upvotes: 1