Reputation: 8113
I am doing a small experiment in Spark and I am having troubles.
wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]
# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
.map(lambda x: (x,1)) <==== something wrong with this line maybe
.reduce(sum)) <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')
Upvotes: 1
Views: 1624
Reputation: 1033
Another similar way : You can also read the list as key,value pairs and use Distinct()
from operator import add
totalCount = (wordCounts
.map(lambda (k,v) : v )
.reduce(add))
average = totalCount / float(wordCounts.distinct().count())
print totalCount
print round(average, 2)
Upvotes: 0
Reputation: 8113
I figured out my solution:
from operator import add
totalCount = (wordCounts
.map(lambda x: x[1])
.reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
Upvotes: 2
Reputation: 396
I myself am not sure, but from looking at your code I can see some issues. The 'map' function cannot be used with a list like 'list_name.map(some stuff)', you need to call the map function like so: 'variable = map(function, arguments)', and if you're using python 3, you would need to do 'variable = list(map(function, arguments))'. Hope that helps somewhat :)
Upvotes: 1