Reputation: 43
I have an RDD object, a list of lists, that looks like this (omitted millions of sublists, only left 3 here)
my_tuples = [[('a','b'),('a','c')],
[('b','a'),('b','f'),('b','g')],
[('zzsx','c'), ('zzsx','q'), ('zzsx','m'), ('zzsx','ay'), ('zzsx','bbt')]]
and I want to convert it into a single list like this
my_list = [('a','b'),('a','c'), ('b','a'),('b','f'),('b','g'),
('zzsx','c'), ('zzsx','q'), ('zzsx','m'), ('zzsx','ay'), ('zzsx','bbt')]
I can't use loops since my_tuples
is an RDD object and the size of my_tuples
is too large to do it. I'm new to spark, any suggestion is appreciated. Thanks.
Upvotes: 4
Views: 1518
Reputation: 45309
You can flatten it using flatMap
:
rdd.flatMap(lambda l: l)
Since your elements are list, you can just return those lists in the function, as done in the example
[('a', 'b'),
('a', 'c'),
('b', 'a'),
('b', 'f'),
('b', 'g'),
('zzsx', 'c'),
('zzsx', 'q'),
('zzsx', 'm'),
('zzsx', 'ay'),
('zzsx', 'bbt')]
Upvotes: 4