Reputation: 347
I am working on a problem where I have to convert around 7 million list-value pairs to key-value pairs by using map() function in PySpark where the length of the list used in given list-value pair can be at most 20.
For example:
listVal= [(["ank","nki","kit"],21),(["arp","rpi","pit"],22)]
Now, I want key-value pairs as
keyval= [("ank",21),("nki",21),("kit",21),("arp",22),("rpi",22),("pit",22)]
When I write
keyval= listval.map(lambda x: some_function(x))
where some_function() is defined as:
def some_function(x):
shingles=[]
for i in range(len(x[0])):
temp=[]
temp.append(x[0][i])
temp.append(x[1])
shingles.append(tuple(temp))
return shingles
I don't get the desired output because I think map() returns one key-value pair for an item of the list, not multiple key-value pairs. I have tried other things also and searched on web but did not find anything related to it.
Any help would be appreciated.
Upvotes: 0
Views: 1147
Reputation: 604
so using your limitations this can be done with pyspark's .flatmap()
def conversion(n):
return [(x, n[1]) for x in n[0]]
listVal.flatMap(conversion)
or in one line
listVal.flatMap(lambda n: [(x, n[1]) for x in n[0]])
Upvotes: 1