Reputation: 17
This code gives me an "int object is not subscriptable" error despite it working for a friend of mine. The error comes in the 4th line where I'm trying to use reduceByKey to calculate the average. Why is this?
nonNullRDD = marchRDD.filter(lambda row: row.journal).filter(lambda row: row.abstract)
abstractRDD = nonNullRDD.map(lambda field: (field.journal, field.abstract))
splitRDD = abstractRDD.map(lambda word: (word[0], len(word[1].split(" "))))
groupedRDD = splitRDD.reduceByKey(lambda x, y: (x[0]+y[0], x[1]+y[1])).mapValues(lambda x: x[0]/x[1])
Upvotes: 0
Views: 218
Reputation: 42342
In the reduceByKey
function, you provide a lambda function which acts on the value of the RDD, which is an integer from len(word[1].split(" "))
. You tried to do x[0]
on an integer, which results in the error you got.
I believe the RDD should be in the form (key, (value, 1))
so that the fourth line of your code will give the average for each key. In order to achieve that, you can change the lambda function in the third line to:
splitRDD = abstractRDD.map(lambda word: (word[0], (len(word[1].split(" ")), 1)))
Upvotes: 1