Reputation: 55
Let's say I have a dataframe df
looking like this:
|ColA |
|---------|
|B=7 |
|(no data)|
|C=5 |
|B=3,C=6 |
How do I extract the data into new colums, so it looks like this:
|ColA | B | C |
|------|---|---|
|True | 7 | |
|False | | |
|True | | 5 |
|True | 3 | 6 |
For filling the columns I know I can use regex .extract, as shown in this solution.
But how do I set the Column name at the same time? So far I use a loop over df.ColA.loc[df["ColA"].isna()].iteritems()
, but that does not seem like the best option for a lot of data.
Upvotes: 0
Views: 48
Reputation: 260640
You could use str.extractall
to get the data, then reshape the output and join
to a derivative of the original dataframe:
# create the B/C columns
df2 = (df['ColA'].str.extractall('([^=]+)=([^=,]+),?')
.set_index(0, append=True)
.droplevel('match')[1]
.unstack(0, fill_value='')
)
# rework ColA and join previous output
df.notnull().join(df2).fillna('')
# or if several columns:
df.assign(ColA=df['ColA'].notnull()).join(df2).fillna('')
output:
ColA B C
0 True 7
1 False
2 True 5
3 True 3 6
Upvotes: 1