Josh Fox
Josh Fox

Reputation: 101

Create a New Pandas column in df from multiple columns using different rules

I have the following df

        SUITE      DET 
0       HISTO  HisRep2  
1       HISTO  HisRep3 
2         NaN    store  
3         NaN   OnHold 
4        PMFA   Nemato  
5    bactmate   Nemato 
6      bacto1   abbac4  
7   abbadabba   BhyCPC 
8      HAEMFF   pmbac3
9      INCLIN    Trico 
10     BOCOSP    isol1 

I want to create a third column. The third column will hold a mix of data from the first two columns, using 3 criteria.

I have created code for the 3 criteria individually, but I want to do them simultaneously. My 3 bits are

  1. Create the new column 'SUITEDET' and if the column 'SUITE' contains a Nan/is empty, fill in the new column using data from column 'DET'
df['SUITEDET'] = df[df['SUITE'].isnull()]['DET']

which makes df


        SUITE      DET SUITEDET
0       HISTO  HisRep2      NaN
1       HISTO  HisRep3      NaN
2         NaN    store    store
3         NaN   OnHold   OnHold
4        PMFA   Nemato      NaN
5    bactmate   Nemato      NaN
6      bacto1   abbac4      NaN
7   abbadabba   BhyCPC      NaN
8      HAEMFF   pmbac3      NaN
9      INCLIN    Trico      NaN
10     BOCOSP    isol1      NaN
  1. Create the new column 'SUITEDET' and if the 'DET' column has entries in the list 'usedet',complete the new column using data from the column 'SUITE'
usedet = ['pmbac1','pmbac2','pmbac3','pmbac4','pmbac5','pmbac6','bact1','bact2','bact3','bact4','bact5','bact6','bactAdd1','bactAdd2','bactAdd3','abbac1','abbac2','abbac3','abbac4','abbac5','abbac6','BrachSmear','Brachy','Brachy1','Brachy10','Brachy2','Brachy3','Brachy4','Brachy5','Brachy6','Brachy7','Brachy8','Brachy9','BhyC','isol','salm1','salm2','salm3','salm4','pmavb1','pmavb2','pmavb3','pmavb4','pmavb5','pmavb6','pmavsh','pmavsp','cult','zncult','pmfoBK','pmfood','Coccn','Coccid','CoccidGoat','Strngy','StoidE','Stoide','Nemato','NematE','TrichE','Tricd','Tricm','Trico','Trics','Tricu','Monspp','Fecn','BhyC21','BhyCID','BhyCPC','BhyCPCTrig','BhyClt','isol1','isol1C','isol1F','isol1M','isol1S','isol2','isol2C','isol2F','isol2M','isol2S','isol3','isol3C','isol3F','isol3M','isol3S','isolA','isolB','isolC']

df['SUITEDET'] = df[df['DET'].isin(usedet)]['SUITE']

which creates df

        SUITE      DET   SUITEDET
0       HISTO  HisRep2        NaN
1       HISTO  HisRep3        NaN
2         NaN    store        NaN
3         NaN   Nemato        NaN
4        PMFA   BodyWt        PMFA
5    bactmate   Nemato   bactmate
6      bacto1   abbac4     bacto1
7   abbadabba   BhyCPC  abbadabba
8      HAEMFF   pmbac3     HAEMFF
9      INCLIN    Trico     INCLIN
10     BOCOSP    isol1     BOCOSP
  1. Create the new column 'SUITEDET' and if the 'SUITE' column has entries in the list 'usesuite',complete the new column using data from the column 'SUITE'
usesuite = ["HAEMFF","HAEM4F","INCLIN","BOCOSP","OVCOSP","WECOCF","WECOCA","WECOCP","ECOPCR","BLKWEC","BLKWF","BLKFLK","FLKIND","WECFLK","BHYCULT","BHY21","INCH","LEPTMS","GSH-PX","HISTO","CHLEIAS","CHLEIAG","BVD","BVDANT","BVDAB","CLA","IBRMS","SINFAB","JMILKI","JMILKB","BVDMS","BVDIND","BVDPCR","SBVIMB","SBVIMI","RUMENE","FATLIV","DOWNER","BMMETP","FAMIN","OMETAP","BOVPRO","OPRODP","COPBLK","COPRO","RESPCR"]

df['SUITEDET'] = df[df['SUITE'].isin(usesuite)]['SUITE']

which makes df

        SUITE      DET SUITEDET
0       HISTO  HisRep2    HISTO
1       HISTO  HisRep3    HISTO
2         NaN    store      NaN
3         NaN   OnHold      NaN
4        PMFA   Nemato      NaN
5    bactmate   Nemato      NaN
6      bacto1   abbac4      NaN
7   abbadabba   BhyCPC      NaN
8      HAEMFF   pmbac3   HAEMFF
9      INCLIN    Trico   INCLIN
10     BOCOSP    isol1   BOCOSP

what I want to get out at the end is a df like this, where the third column is completed using the by using the three criteria to select the correct data from the first two columns

        SUITE      DET SUITEDET
0       HISTO  HisRep2    HISTO
1       HISTO  HisRep3    HISTO
2         NaN    store    store
3         NaN   OnHold    OnHold
4        PMFA   Nemato    PMFA
5    bactmate   Nemato    bactmate
6      bacto1   abbac4    bacto1
7   abbadabba   BhyCPC    abbadabba
8      HAEMFF   pmbac3    HAEMFF
9      INCLIN    Trico    INCLIN
10     BOCOSP    isol1    BOCOSP

Should I run the three criteria and create 3dfs and then try and merge them or is there a pythonic way to run the 3 bits together. I don't even need a specific answer if it's a hassle, just an idea of something to go look up. Thanks.

Upvotes: 0

Views: 61

Answers (1)

Muhammad Hassan
Muhammad Hassan

Reputation: 4229

Try:

df['SUITEDET'] = df[df['SUITE'].isnull()]['DET']
condition = df['DET'].isin(usedet) | df['SUITE'].isin(usesuite)
df.loc[condition, 'SUITEDET'] = df['SUITE']

OUTPUT:

        SUITE      DET   SUITEDET
0       HISTO  HisRep2      HISTO
1       HISTO  HisRep3      HISTO
2         NaN    store      store
3         NaN   OnHold     OnHold
4        PMFA   Nemato       PMFA
5    bactmate   Nemato   bactmate
6      bacto1   abbac4     bacto1
7   abbadabba   BhyCPC  abbadabba
8      HAEMFF   pmbac3     HAEMFF
9      INCLIN    Trico     INCLIN
10     BOCOSP    isol1     BOCOSP

Solution 2:

condition = df['DET'].isin(usedet) | df['SUITE'].isin(usesuite)
df['SUITEDET'] = np.where(condition, df['SUITE'], np.where(df['SUITE'].isnull(), df['DET'], np.nan))

Upvotes: 2

Related Questions