Reputation: 11
I want to split the key in map reduce and create a new key value pair.
current doc file:
[(u'ab,xy,sc,dr , u'doc1)]
I want to split the key with each value as:
[(u'ab,doc1) , (u'xy,doc1) ,(u'sc,doc1) , (u'dr,doc1)]
Any help is much appreciated! Thanks
Upvotes: 1
Views: 1023
Reputation: 2072
def process(record):
for key in record[0].split(','):
yield key, record[1]
rdd = sc.parallelize([(u'ab,xy,sc,dr', u'doc1')])
rdd.flatMap(process).collect()
will result in
[(u'ab', u'doc1'), (u'xy', u'doc1'), (u'sc', u'doc1'), (u'dr', u'doc1')]
Upvotes: 2