cpd1
cpd1

Reputation: 789

Merge list of lists in pySpark RDD

I have lists of tuples that I want to combine into one list. I've been able to process the data using lambdas and list comprehension to where I'm close to being able to use reduceByKey but not sure how to merge the lists. So the format...

[[(0, 14), (0, 24)], [(1, 19), (1, 50)], ...]

And I would like it to be this way....

[(0, 14), (0, 24), (1, 19), (1, 50), ...]

Code that got me to where I need to be...

test = test.map(lambda x: (x[1], [e * local[x[1]] for e in x[0]]))
test = test.map(lambda x: [(x[0], y) for y in x[1]])

But not sure from there what to do to merge the lists

Upvotes: 6

Views: 5933

Answers (2)

mrsrinivas
mrsrinivas

Reputation: 35434

You can do,

test = test.flatMap(identity)

or

test = test.flatMap(lambda list: list)

Upvotes: 9

cpd1
cpd1

Reputation: 789

Thanks to @mrsrinivas for providing the hint...

test = test.flatMap(lambda xs: [(x[0], x[1]) for x in xs])

Upvotes: 0

Related Questions