Reputation: 115

How to use map() to convert (key,values) pair to values only in Pyspark

I have this code in PySpark to .

wordsList = ['cat', 'elephant', 'rat', 'rat', 'cat']
wordsRDD = sc.parallelize(wordsList, 4)


wordCounts = wordPairs.reduceByKey(lambda x,y:x+y)
print wordCounts.collect()

#PRINTS-->  [('rat', 2), ('elephant', 1), ('cat', 2)]

from operator import add
totalCount = (wordCounts
              .map(<< FILL IN >>)
              .reduce(<< FILL IN >>))

#SHOULD PRINT 5

#(wordCounts.values().sum()) // does the trick but I want to this with map() and reduce()


I need to use a reduce() action to sum the counts in wordCounts and then divide by the number of unique words.

* But first I need to map() the pair RDD wordCounts, which consists of (key, value) pairs, to an RDD of values.

This is where I am stuck. I tried something like this below, but none of them work:

.map(lambda x:x.values())
.reduce(lambda x:sum(x)))

AND,

.map(lambda d:d[k] for k in d)
.reduce(lambda x:sum(x)))

Any help in this would be highly appreciated!

Upvotes: 2

Answers (4)

se7entyse7en

Reputation: 4294

Alternatively to map-reduce you can also use aggregate which should be even faster:

In [7]: x = sc.parallelize([('rat', 2), ('elephant', 1), ('cat', 2)])
In [8]: x.aggregate(0, lambda acc, value: acc + value[1], lambda acc1, acc2: acc1 + acc2)
Out[8]: 5

Upvotes: 0

Jesse

Reputation: 213

Mr. Tompsett, I got this to work also:

from operator import add
x = (w
     .map(lambda x: x[1])
     .reduce(add))

Upvotes: 0

AlexH

Reputation: 231

Yes, your lambda function in .map takes in a tuple x as an argument and returns the 2nd element via x[1](the 2nd index in the tuple). You could also take in the tuple as an argument and return the 2nd element as follows:

.map(lambda (x,y) : y)

Upvotes: 1

user2090166

Reputation: 115

Finally I got the answer, its like this -->

wordCounts
.map(lambda x:x[1])
.reduce(lambda x,y:x + y)

Upvotes: 4

How to use map() to convert (key,values) pair to values only in Pyspark

Answers (4)

Related Questions