sumit
sumit

Reputation: 11

split the key in a mapreduced text file in pyspark

I want to split the key in map reduce and create a new key value pair.

current doc file:

[(u'ab,xy,sc,dr , u'doc1)]

I want to split the key with each value as:

[(u'ab,doc1) , (u'xy,doc1) ,(u'sc,doc1) , (u'dr,doc1)]

Any help is much appreciated! Thanks

Upvotes: 1

Views: 1023

Answers (1)

Leo
Leo

Reputation: 2072

def process(record):
    for key in record[0].split(','):
        yield key, record[1]

rdd = sc.parallelize([(u'ab,xy,sc,dr', u'doc1')])
rdd.flatMap(process).collect()

will result in

[(u'ab', u'doc1'), (u'xy', u'doc1'), (u'sc', u'doc1'), (u'dr', u'doc1')]

Upvotes: 2

Related Questions