Create a new column by adding matching cell content in pandas

Question

Hello everyone I would need help in order to to fusionnate columns containt when there is a specific grep value inside.

Here is an exemple

Species COL1 COL2         COL3           COL4     COL5
SPf_1   4    f_G1         None           None     None
SP1     9    -_Haploviric -_unclassified f_G3     None
SP1     36   k_Orthorn    f_G7           None     None
SP2     90   k_Orthorn    f_G3           p_Pisuvi None
SP3     32   None         None           None     f_83
SP3     2    -_Ribovi     Cattus         None     None
SP4     89   None         None           None     None

and then I would like to add a new column called F_COL where I put for each row the cell content with a f_ pattern on it. Note (I only have to check of COL1-5 but not Species columns that can have f_ patterns also).

I should get :

Species COL1 COL2         COL3           COL4     COL5 F_COL
SPf_1     4    f_G1         None           None     None f_G1
SP1     9    -_Haploviric -_unclassified f_G3     None f_G3
SP1     36   k_Orthorn    f_G7           None     None f_G7
SP2     90   k_Orthorn    f_G3           p_Pisuvi None f_G3
SP3     32   None         None           None     f_83 f_83
SP3     2    -_Ribovi     Cattus         None     None NA
SP4     89   None         None           None     None NA

Does someone have an idea please ?

Here is the data in dictionnary format :

{'Species': {0: 'SPf_1', 1: 'SP1', 2: 'SP1', 3: 'SP2', 4: 'SP3', 5: 'SP3', 6: 'SP4'}, 'COL1': {0: 4, 1: 9, 2: 36, 3: 90, 4: 32, 5: 2, 6: 89}, 'COL2': {0: 'f_G1', 1: '-_Haploviric-', 2: 'k_Orthorn', 3: 'k_Orthorn', 4: 'None', 5: '-_Ribovi', 6: 'None'}, 'COL3': {0: 'None', 1: '_unclassified', 2: 'f_G7', 3: 'f_G3', 4: 'None', 5: 'Cattus', 6: 'None'}, 'COL4': {0: 'None', 1: 'f_G3', 2: 'None', 3: 'p_Pisuvi', 4: 'None', 5: 'None', 6: 'None'}, 'COL5': {0: 'None', 1: 'None', 2: 'None', 3: 'None', 4: 'f_83', 5: 'None', 6: 'None'}}

Shubham Sharma · Accepted Answer

Let us filter and stack the columns from COL1 to COL5, then extract the f_pattern strings followed by groupby + first on level=0

df.filter(regex='COL[1-5]').stack()\
  .str.extract(r'^(f_.*)', expand=False).groupby(level=0).first()

0    f_G1
1    f_G3
2    f_G7
3    f_G3
4    f_83
5    None
6    None
dtype: object

Create a new column by adding matching cell content in pandas

Answers (2)

Related Questions