How to unstack a pandas column according to a certain format?

Question

My df has a column unnamed whose first element is

'{company=*, location=world, industry=*, segment=*, feature=*, product=*, basekpi=customer_demand}'

and second element is NaN. I would like to unstack this column into 7 columns company, location, industry, segment, feature, product, and basekpi. My expected_df is

Could you please elaborate on how to do so?

import pandas as pd
unnamed = ['{company=*, location=world, industry=*, segment=*, feature=*, product=*, basekpi=customer_demand}',
           'NaN']
df = pd.DataFrame({'id': [0, 1], 'unnamed': unnamed})
df

Shubham Sharma · Accepted Answer

`Series.str.findall`

We can use findall with regex capture groups to extract key-value pairs from the unnamed column

pd.DataFrame(map(dict, df['unnamed'].str.findall(r'([^{=,]+)=([^,}]+)')))

  company  location  industry  segment  feature  product          basekpi
0       *     world         *        *        *        *  customer_demand
1     NaN       NaN       NaN      NaN      NaN      NaN              NaN

Regex details

([^{=,]+): first capturing group
- [^=,]+ : Matches any character not present in the list [{=,] one or more times
= : Matches the = character literally
([^,}]+) : Second capturing group
- [^,]+ : Matches any character not present in the list [,}] one or more times

See the online regex demo

How to unstack a pandas column according to a certain format?

Answers (2)

`Series.str.findall`

Related Questions