Reputation: 2870
My df
has a column unnamed
whose first element is
'{company=*, location=world, industry=*, segment=*, feature=*, product=*, basekpi=customer_demand}'
and second element is NaN
. I would like to unstack this column into 7 columns company
, location
, industry
, segment
, feature
, product
, and basekpi
. My expected_df
is
Could you please elaborate on how to do so?
import pandas as pd
unnamed = ['{company=*, location=world, industry=*, segment=*, feature=*, product=*, basekpi=customer_demand}',
'NaN']
df = pd.DataFrame({'id': [0, 1], 'unnamed': unnamed})
df
Upvotes: 1
Views: 62
Reputation: 71687
Series.str.findall
We can use findall
with regex capture groups to extract key-value pairs from the unnamed
column
pd.DataFrame(map(dict, df['unnamed'].str.findall(r'([^{=,]+)=([^,}]+)')))
company location industry segment feature product basekpi
0 * world * * * * customer_demand
1 NaN NaN NaN NaN NaN NaN NaN
Regex details
([^{=,]+)
: first capturing group
[^=,]+
: Matches any character not present in the list [{=,]
one or more times=
: Matches the =
character literally([^,}]+)
: Second capturing group
[^,]+
: Matches any character not present in the list [,}]
one or more timesSee the online regex demo
Upvotes: 3
Reputation: 75100
You can replace the unwanted strings and split ,explode then unstack:
s = (df['unnamed'].replace({"=":":","{":"","}":""},regex=True)
.str.split(",").explode().str.split(":"))
u = pd.DataFrame(s.tolist(),s.index).set_index(0,append=True)[1].unstack()
out = df.join(u)
print(out)
id unnamed basekpi \
0 0 {company=*, location=world, industry=*, segmen... customer_demand
1 1 NaN NaN
feature industry location product segment NaN company
0 * * world * * NaN *
1 NaN NaN NaN NaN NaN None NaN
Upvotes: 2