Reputation: 183
I have a large Dataframe with > 3000 category labels. I'd like to selectively re-code labels based on groupby counts. This is like a conditional replace in Excel. For example:
ID Label
1 cat
2 dog
3 cat
4 cat
5 dog
6 bird
count of each:
cat: 3
dog: 2
bird: 1
logic: if count <= 2, then change label to 'other'
ID Label
1 cat
2 other
3 cat
4 cat
5 other
6 other
count of each:
cat: 3
other: 3
Perhaps some of you know a more pythonic way to accomplish the same thing. maybe the mysterious lambda function can help...
already read a bunch of posts here, as usual.
My meager Python code looks like this:
df['Label'] = df.groupby('Label')['Label'].transform('count')
df['New_Label'] = np.where(df.label <= 2, 'other', df.label)
Upvotes: 1
Views: 79
Reputation: 3290
This code uses pd.DataFrame.where()
instead of np.where()
and does it in one line:
df.Label = df.Label.where(df.groupby('Label')['Label'].transform('count') > 2, 'other')
print(df)
Label
ID
1 cat
2 other
3 cat
4 cat
5 other
6 other
Upvotes: 1