Reputation: 73376
I have an RDD called codes
, which is a pair, that has a string as its 1st half and another pair as its 2nd half:
In [76]: codes.collect()
Out[76]:
[(u'3362336966', (6208, 5320)),
(u'7889466042', (4140, 5268))]
and I am trying to get this:
In [76]: codes.collect()
Out[76]:
[(u'3362336966', 6208),
(u'3362336966', 5320),
(u'7889466042', 4140),
(u'7889466042', 5268)]
How to do this?
My failed attempt:
In [77]: codes_in = codes.map(lambda x: (x[0], x[1][0]), (x[0], x[1][1]))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-77-e1c7925bc075> in <module>()
----> 1 codes_in = codes.map(lambda x: (x[0], x[1][0]), (x[0], x[1][1]))
NameError: name 'x' is not defined
Upvotes: 0
Views: 24
Reputation: 18022
I think what you want is the following:
codes_in = codes.map(lambda x: [(x[0], p) for p in x[1]]).flatMap(lambda x: x)
If it is python 2, for legibility you could:
codes_in = codes.map(lambda k, vs: [(k, v) for v in vs]).flatMap(lambda x: x)
By this way you will be able to "extract" each value associated with the key and force that every row is a record of form (k, v)
.
Upvotes: 1