Devarshi Goswami
Devarshi Goswami

Reputation: 1225

unpack variable length dictionary from pandas column and create separate columns

I have a pandas dataframe in which one column custom consists of dictionaries within a list. The list may be empty or have one or more dictionary objects within it. for example...

id    custom
1       []
2       [{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status','name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}]
3       [{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]

I'm interested in extracting the data from the JSON into separate columns based on the dict keys named 'key' and 'value'! for example: here, the output df will have additional columns impact and proposed_status:

 id   custom     impact        proposed_status
 1    ...        NA               NA
 2    ...        abc              pqr
 3    ...        xyz              NA   

Could the smart people of StackOverflow please guide me on the right way to solve this? Thanks!

Upvotes: 0

Views: 274

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31156

The approach is in the comments

df = pd.DataFrame({'id': [1, 2, 3],
 'custom': [[],
  [{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'},
   {'key': 'proposed_status',
    'name': 'PROPOSED Status [temporary]',
    'value': 'pqr',
    'type': 'string'}],
  [{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}]]})

# expand out lists, reset_index() so join() will work
df2 = df.explode("custom").reset_index(drop=True)
# join to keep "id"
df2 = (df2.join(df2["custom"]
    # expand embedded dict
    .apply(pd.Series))
    .loc[:,["id","key","value"]]
 # empty list generate spurios NaN, remove
     .dropna()
 # turn key attribute into column
 .set_index(["id","key"]).unstack(1)
 # cleanup multi index columns
 .droplevel(0, axis=1)
)

df.merge(df2, on="id", how="left")

id custom impact proposed_status
0 1 [] nan nan
1 2 [{'key': 'impact', 'name': 'Impact', 'value': 'abc', 'type': 'string'}, {'key': 'proposed_status', 'name': 'PROPOSED Status [temporary]', 'value': 'pqr', 'type': 'string'}] abc pqr
2 3 [{'key': 'impact', 'name': 'Impact', 'value': 'xyz', 'type': 'string'}] xyz nan

Upvotes: 2

Related Questions