BufBills
BufBills

Reputation: 8113

spark reduce and map issue

I am doing a small experiment in Spark and I am having troubles.

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]


# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
              .map(lambda x: (x,1))   <==== something wrong with this line maybe
              .reduce(sum))            <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)

# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')

Upvotes: 1

Views: 1624

Answers (3)

Nim J
Nim J

Reputation: 1033

Another similar way : You can also read the list as key,value pairs and use Distinct()

from operator import add
totalCount = (wordCounts
          .map(lambda (k,v)  : v )
          .reduce(add))
average = totalCount / float(wordCounts.distinct().count())
print totalCount
print round(average, 2)

Upvotes: 0

BufBills
BufBills

Reputation: 8113

I figured out my solution:

from operator import add
totalCount = (wordCounts
              .map(lambda x: x[1])
              .reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)

Upvotes: 2

R21
R21

Reputation: 396

I myself am not sure, but from looking at your code I can see some issues. The 'map' function cannot be used with a list like 'list_name.map(some stuff)', you need to call the map function like so: 'variable = map(function, arguments)', and if you're using python 3, you would need to do 'variable = list(map(function, arguments))'. Hope that helps somewhat :)

Upvotes: 1

Related Questions