Reputation: 1228
I'm a bit new to Spark and I am trying to do a simple mapping.
My data is like the following:
RDD((0, list(tuples)), ..., (19, list(tuples))
What I want to do is grabbing the first item in each tuple, so ultimately something like this:
RDD((0, list(first item of each tuple),..., (19, list(first item of each tuple))
Can someone help me out with how to map this?
I'll appreciate that!
Upvotes: 2
Views: 2163
Reputation: 191983
Something like this?
kv
here meaning "key-value" and mapping itemgetter
over the values. So, map
within a map
:-)
from operator import itemgetter
rdd = sc.parallelize([(0, [(0,'a'), (1,'b'), (2,'c')]), (1, [(3,'x'), (5,'y'), (6,'z')])])
mapped = rdd.mapValues(lambda v: map(itemgetter(0), v))
Output
mapped.collect()
[(0, [0, 1, 2]), (1, [3, 5, 6])]
Upvotes: 2
Reputation: 30288
You can use mapValues
to convert the list of tuples to a list of tuple[0]:
rdd.mapValues(lambda x: [t[0] for t in x])
Upvotes: 4