Reputation: 2433
I am trying to work on a simple word count problem and trying to figure if that can be done by use of map, filter and reduce exclusively.
Following is an example of an wordRDD(the list used for spark):
myLst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']
All i need is to count the words and present it in a tuple format:
counts = [('cat', 1), ('elephant', 1), ('rat', 1), ('rat', 1), ('cat', 1)]
I tried with simple map() and lambdas as:
counts = myLst.map(lambdas x: (x, <HERE IS THE PROBLEM>))
I might be wrong with the syntax or maybe confused. P.S.: This isnt a duplicate questin as rest answers give suggestions using if/else or list comprehensions.
Thanks for the help.
Upvotes: 1
Views: 2758
Reputation: 5
You Can use map() to get this result:
myLst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']
list(map(lambda x : (x,len(x)), myLst))
Upvotes: 0
Reputation: 11
If you don't want the full reduce step done for you (which aggregated the counts in SuperSaiyan's answer), you can use map this way:
>>> myLst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']
>>> counts = list(map(lambda s: (s,1), myLst))
>>> print(counts)
[('cats', 1), ('elephants', 1), ('rats', 1), ('rats', 1), ('cats', 1), ('cats', 1)]
Upvotes: 0
Reputation: 38899
Not using a lambda but gets the job done.
from collections import Counter
c = Counter(myLst)
result = list(c.items())
And the output:
In [21]: result
Out[21]: [('cats', 3), ('rats', 2), ('elephants', 1)]
Upvotes: 1
Reputation: 44454
You don't need map(..)
at all. You can do it with just reduce(..)
>>> def function(obj, x):
... obj[x] += 1
... return obj
...
>>> from functools import reduce
>>> reduce(function, myLst, defaultdict(int)).items()
dict_items([('elephants', 1), ('rats', 2), ('cats', 3)])
You can then iterate of the result.
However, there's a better way of doing it: Look into Counter
Upvotes: 2