Jan T.
Jan T.

Reputation: 55

Creating new Columns and fill them based on another columns values

Let's say I have a dataframe df looking like this:

|ColA     |
|---------|
|B=7      |
|(no data)|
|C=5      |
|B=3,C=6  |

How do I extract the data into new colums, so it looks like this:

|ColA  | B | C |
|------|---|---|
|True  | 7 |   |
|False |   |   |
|True  |   | 5 |
|True  | 3 | 6 |

For filling the columns I know I can use regex .extract, as shown in this solution.

But how do I set the Column name at the same time? So far I use a loop over df.ColA.loc[df["ColA"].isna()].iteritems(), but that does not seem like the best option for a lot of data.

Upvotes: 0

Views: 48

Answers (1)

mozway
mozway

Reputation: 260640

You could use str.extractall to get the data, then reshape the output and join to a derivative of the original dataframe:

# create the B/C columns
df2 = (df['ColA'].str.extractall('([^=]+)=([^=,]+),?')
                 .set_index(0, append=True)
                 .droplevel('match')[1]
                 .unstack(0, fill_value='')
       )

# rework ColA and join previous output
df.notnull().join(df2).fillna('')

# or if several columns:
df.assign(ColA=df['ColA'].notnull()).join(df2).fillna('')

output:

    ColA  B  C
0   True  7   
1  False      
2   True     5
3   True  3  6

Upvotes: 1

Related Questions