Reputation: 151
I currently have an RDD with an array that stores a key-value pair where the key is the 2D indices of the array and the value is the number at that spot. For example [((0,0),1),((0,1),2),((1,0),3),((1,1),4)] I want to add up the values of each key with the surrounding values. In relation to my earlier example, I want to add up 1,2,3 and place it in the (0,0) key value spot. How would I do this?
Upvotes: 0
Views: 1734
Reputation: 41
I would suggest you do the following:
Define a function that, given a pair (i,j), returns a list with the pairs corresponding to the positions surrounding (i,j), plus the input pair (i,j). For instance, lets say the function is called surrounding_pairs(pair)
. Then:
surrounding_pairs((0,0)) = [ (0,0), (0,1), (1,0) ]
surrounding_pairs((2,3)) = [ (2,3), (2,2), (2,4), (1,3), (3,3) ]
Of course, you need to be careful and return only valid positions.
Use a flatMap
on your RDD as follows:
MyRDD = MyRDD.flatMap(lambda (pos, v): [(p, v) for p in surrounding_pairs(pos)])
This will map your RDD from
[((0,0),1),((0,1),2),((1,0),3),((1,1),4)]
to
[((0,0),1),((0,1),1),((1,0),1),
((0,1),2),((0,0),2),((1,1),2),
((1,0),3),((0,0),3),((1,1),3),
((1,1),4),((1,0),4),((0,1),4)]
This way, the value at each position will be "copied" to the neighbour positions.
Finally, just use a reduceByKey
to add the corresponding values at each position:
from operator import add
MyRDD = MyRDD.reduceByKey(add)
I hope this makes sense.
Upvotes: 0