Reputation: 3
I am trying to find the sum of occurrence of a words from a list in a multiple lists. the list objects within list is huge so I used just a dummy instance
multiple=[['apple','ball','cat']['apple','ball']['apple','cat'].......]
words=['apple','ball','cat','duck'......]
word = 'apple'
cnt = Counter()
total = 0
for i in multiple:
for j in i:
if word in j:
cnt[word] +=1
total += cnt[word]
I wanted an output like this:
{'apple':3,'ball':2,'cat':2}
Upvotes: 0
Views: 231
Reputation: 25954
You can just feed the Counter
a generator expression:
cnt = Counter(word for sublist in multiple for word in sublist)
cnt
Out[40]: Counter({'apple': 3, 'ball': 2, 'cat': 2})
sum(cnt.values())
Out[41]: 7
I didn't really see the point of your words
list. You didn't use it.
If you need to filter out words that are not in words
, make words
a set
, not a list
.
words = {'apple','ball','cat','duck'}
cnt = Counter(word for sublist in multiple for word in sublist if word in words)
Otherwise you get O(n**2) behavior in what should be a O(n) operation.
Upvotes: 2
Reputation: 87281
This works in Python 2.7 and Python 3.x:
from collections import Counter
multiple=[['apple','ball','cat'],['apple','ball'],['apple','cat']]
words=['apple','ball','cat','duck']
cnt = Counter()
total = 0
for i in multiple:
for word in i:
if word in words:
cnt[word] +=1
total += 1
print cnt #: Counter({'apple': 3, 'ball': 2, 'cat': 2})
print dict(cnt) #: {'apple': 3, 'ball': 2, 'cat': 2}
print total #: 7
print sum(cnt.values()) #: 7
In Python 2.x you should use .itervalues()
instead of .values()
even though both work.
A bit shorter solution, based on roippi's answer:
from collections import Counter
multiple=[['apple','ball','cat'],['apple','ball'],['apple','cat']]
cnt = Counter(word for sublist in multiple for word in sublist)
print cnt #: Counter({'apple': 3, 'ball': 2, 'cat': 2})
Upvotes: 0