Getting the first item for a tuple for eaching a row in a list in pyspark

Question

I'm a bit new to Spark and I am trying to do a simple mapping.
My data is like the following:

RDD((0, list(tuples)), ..., (19, list(tuples))

What I want to do is grabbing the first item in each tuple, so ultimately something like this:

RDD((0, list(first item of each tuple),..., (19, list(first item of each tuple))

Can someone help me out with how to map this?
I'll appreciate that!

AChampion · Accepted Answer

You can use mapValues to convert the list of tuples to a list of tuple[0]:

rdd.mapValues(lambda x: [t[0] for t in x])

Answers (2)