Reputation: 303
My data frame looks like -
id code
1 AA
2 BB
3 CC
4 AA
5 GG
6 BB
7 NN
8 YY
My final output looks like -
id code group
1 AA A
2 BB B
3 CC A
4 AA A
5 GG G
6 BB B
7 NN other
8 YY G
My code looks like -
col = 'code'
conditions = [ (df[col] == 'AA' & df[col] == 'CC'), (df[col] == 'GG' & df[col] == 'YY'), df[col] == 'BB' ]
choices = [ 'A', 'G', 'B' ]
df["group"] = np.select(conditions, choices, default='other')
But code column is in huge category, around 40. Some of the category belongs to A, some are B, some are G and rest of the category belongs to other. I think, I need to create a list for each category in condition section, then we can implement. Otherwise its very difficult to do using above code.
Upvotes: 2
Views: 55
Reputation: 862431
Use Series.map
with dictioanry and then replace non matched values by default value by Series.fillna
:
d = {'AA':'A','CC':'A','GG':'G','YY':'G','BB':'B'}
df["group"] = df[col].map(d).fillna('other')
If format of dictionary is different first is necessary change format like solution above:
d1 = {'A': ['AA','CC'], 'G':['GG','YY'], 'B':['BB']}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k: oldk for oldk, oldv in d1.items() for k in oldv}
df["group"] = df[col].map(d).fillna('other')
Upvotes: 4