Filling a dataframe column with values from another column, based on values from a third column

Question

I have a pandas dataframe like the following. I created the last 3 columns based on unique values from the column RefIDPrefix.

RefIDPrefix	RefIDNumber	GO	PMID	Reactome
GO	12345
PMID	23456
Reactome	34567
GO	45678
GO	56789
PMID	67890

I want to fill the last 3 columns like the following. Basically, based on the value in RefIDPrefix, I want to take the value in RefIDNumber and put it in the correct column corresponding to RefIDPrefix.

RefIDPrefix	RefIDNumber	GO	PMID	Reactome
GO	12345	12345
PMID	23456		23456
Reactome	34567			34567
GO	45678	45678
GO	56789	56789
PMID	67890		67890

I have been trying to do this for a while but haven't been able to figure out how to go about it. Any help would be appreciated!

SeaBean · Accepted Answer

You can use df.pivot() to build the columns from RefIDPrefix and .join() it back to the original df

df.join(df.pivot(columns='RefIDPrefix', values='RefIDNumber').fillna(''))

Output:

  RefIDPrefix  RefIDNumber       GO     PMID Reactome
0          GO        12345  12345.0                  
1        PMID        23456           23456.0         
2    Reactome        34567                    34567.0
3          GO        45678  45678.0                  
4          GO        56789  56789.0                  
5        PMID        67890           67890.0

Edit

For the display format of the numbers in new columns (currently displayed like float number with decimal point), if your RefIDNumber column is actually in string, the numbers in new column will also be in string and have no decimal point (like integers).

However, if RefIDNumber is in numeric format (most probably in positive numbers for ID numbers), we can retain the numbers as integer by fine-tuning the .fillna() part, as follows:

df.join(df.pivot(columns='RefIDPrefix', values='RefIDNumber').fillna(-1, downcast='infer').replace(-1, ''))

Output:

  RefIDPrefix  RefIDNumber     GO   PMID Reactome
0          GO        12345  12345                
1        PMID        23456         23456         
2    Reactome        34567                  34567
3          GO        45678  45678                
4          GO        56789  56789                
5        PMID        67890         67890

Filling a dataframe column with values from another column, based on values from a third column

Answers (1)

Edit

Related Questions