Akira
Akira

Reputation: 2870

How to unstack a pandas column according to a certain format?

My df has a column unnamed whose first element is

'{company=*, location=world, industry=*, segment=*, feature=*, product=*, basekpi=customer_demand}'

and second element is NaN. I would like to unstack this column into 7 columns company, location, industry, segment, feature, product, and basekpi. My expected_df is

enter image description here

Could you please elaborate on how to do so?

import pandas as pd
unnamed = ['{company=*, location=world, industry=*, segment=*, feature=*, product=*, basekpi=customer_demand}',
           'NaN']
df = pd.DataFrame({'id': [0, 1], 'unnamed': unnamed})
df

Upvotes: 1

Views: 62

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71687

Series.str.findall

We can use findall with regex capture groups to extract key-value pairs from the unnamed column

pd.DataFrame(map(dict, df['unnamed'].str.findall(r'([^{=,]+)=([^,}]+)')))

  company  location  industry  segment  feature  product          basekpi
0       *     world         *        *        *        *  customer_demand
1     NaN       NaN       NaN      NaN      NaN      NaN              NaN

Regex details

  • ([^{=,]+): first capturing group
    • [^=,]+ : Matches any character not present in the list [{=,] one or more times
  • = : Matches the = character literally
  • ([^,}]+) : Second capturing group
    • [^,]+ : Matches any character not present in the list [,}] one or more times

See the online regex demo

Upvotes: 3

anky
anky

Reputation: 75100

You can replace the unwanted strings and split ,explode then unstack:

s = (df['unnamed'].replace({"=":":","{":"","}":""},regex=True)
     .str.split(",").explode().str.split(":"))
u = pd.DataFrame(s.tolist(),s.index).set_index(0,append=True)[1].unstack()
out = df.join(u)

print(out)

   id                                            unnamed          basekpi  \
0   0  {company=*, location=world, industry=*, segmen...  customer_demand   
1   1                                                NaN              NaN   

   feature  industry  location  product  segment   NaN company  
0        *         *     world        *        *   NaN       *  
1      NaN       NaN       NaN      NaN      NaN  None     NaN  

Upvotes: 2

Related Questions