joasa
joasa

Reputation: 976

Pandas : Create new column based on text values of other columns

My dataframe looks like this:

    id          text                    labels
0   447         glutamine synthetase    [protein]
1   447         GS                      [protein]
2   447         hepatoma                [indication]
3   447         NaN                      NaN
4   442         Metachromatic           [indication]

I want to transform the dataframe and create two new columns named proteins and indications that contain the text when labels is protein or indication for the same id.

Wanted output

    id          protein                     indication
0   447         glutamine synthetase, GS    hepatoma
0   442         NaN                         Metachromatic

Can someone help how to do this?

Upvotes: 0

Views: 55

Answers (1)

Mayank Porwal
Mayank Porwal

Reputation: 34056

Use df.explode with Groupby.agg and df.pivot:

In [417]: out = df.explode('labels').groupby(['id', 'labels'])['text'].agg(','.join).reset_index().pivot('id', 'labels').reset_index().droplevel(0, axis=1).rename_axis(None, axis=1)

In [423]: out.columns = ['id', 'indication', 'protein']

In [424]: out
Out[424]: 
    id     indication                  protein
0  442  Metachromatic                      NaN
1  447       hepatoma  glutamine synthetase,GS

Upvotes: 1

Related Questions