Mapping a List-Value pair to a key-value pair with PySpark

Question

I am working on a problem where I have to convert around 7 million list-value pairs to key-value pairs by using map() function in PySpark where the length of the list used in given list-value pair can be at most 20.
For example:

listVal= [(["ank","nki","kit"],21),(["arp","rpi","pit"],22)]

Now, I want key-value pairs as

 keyval= [("ank",21),("nki",21),("kit",21),("arp",22),("rpi",22),("pit",22)]

When I write

 keyval= listval.map(lambda x: some_function(x))

where some_function() is defined as:

def some_function(x):
  shingles=[]
  for i in range(len(x[0])):
    temp=[]
    temp.append(x[0][i])
    temp.append(x[1])
    shingles.append(tuple(temp))
 
  return shingles

I don't get the desired output because I think map() returns one key-value pair for an item of the list, not multiple key-value pairs. I have tried other things also and searched on web but did not find anything related to it.
Any help would be appreciated.

jimakr · Accepted Answer

so using your limitations this can be done with pyspark's .flatmap()

def conversion(n):
    return [(x, n[1]) for x in n[0]]


listVal.flatMap(conversion)

or in one line

listVal.flatMap(lambda n: [(x, n[1]) for x in n[0]])

Mapping a List-Value pair to a key-value pair with PySpark

Answers (1)

Related Questions