multiple condition in python data frame

Question

My data frame looks like -

id          code    
1            AA
2            BB
3            CC
4            AA
5            GG
6            BB
7            NN
8            YY

My final output looks like -

id          code         group  
1            AA            A
2            BB            B
3            CC            A
4            AA            A
5            GG            G
6            BB            B
7            NN            other
8            YY            G

My code looks like -

col         = 'code'
conditions  = [ (df[col] == 'AA' & df[col] == 'CC'), (df[col] == 'GG' & df[col] == 'YY'), df[col] == 'BB' ]
choices     = [ 'A', 'G', 'B' ]

df["group"] = np.select(conditions, choices, default='other')

But code column is in huge category, around 40. Some of the category belongs to A, some are B, some are G and rest of the category belongs to other. I think, I need to create a list for each category in condition section, then we can implement. Otherwise its very difficult to do using above code.

jezrael · Accepted Answer

Use Series.map with dictioanry and then replace non matched values by default value by Series.fillna:

d = {'AA':'A','CC':'A','GG':'G','YY':'G','BB':'B'}

df["group"] = df[col].map(d).fillna('other')

If format of dictionary is different first is necessary change format like solution above:

d1 = {'A': ['AA','CC'], 'G':['GG','YY'], 'B':['BB']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k: oldk for oldk, oldv in d1.items() for k in oldv}
df["group"] = df[col].map(d).fillna('other')

multiple condition in python data frame

Answers (1)

Related Questions