peeps
peeps

Reputation: 53

How to split list of dictionary in one column into two columns in pyspark dataframe?

enter image description hereI want to split the filteredaddress column of the spark dataframe above into two new columns that are Flag and Address:

customer_id|pincode|filteredaddress|                                                              Flag| Address
1000045801 |121005 |[{'flag':'0', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]

Can anyone please tell me how can I do it?

Upvotes: 0

Views: 551

Answers (1)

mck
mck

Reputation: 42352

You can get the values from filteredaddress map column using the keys:

df2 = df.selectExpr(
    'customer_id', 'pincode',
    "filteredaddress['flag'] as flag", "filteredaddress['address'] as address"
)

Other ways to access map values are:

import pyspark.sql.functions as F

df.select(
    'customer_id', 'pincode',
    F.col('filteredaddress')['flag'],
    F.col('filteredaddress')['address']
)

# or, more simply

df.select(
    'customer_id', 'pincode',
    'filteredaddress.flag',
    'filteredaddress.address'
)

Upvotes: 1

Related Questions