logisticregress
logisticregress

Reputation: 183

pythonic conditional aggregation

I have a large Dataframe with > 3000 category labels. I'd like to selectively re-code labels based on groupby counts. This is like a conditional replace in Excel. For example:

ID Label   
1  cat  
2  dog  
3  cat  
4  cat  
5  dog  
6  bird 

count of each:

cat: 3  
dog: 2  
bird: 1   

logic: if count <= 2, then change label to 'other'

ID Label   
1  cat  
2  other  
3  cat  
4  cat  
5  other  
6  other  

count of each:

cat: 3  
other: 3  

Perhaps some of you know a more pythonic way to accomplish the same thing. maybe the mysterious lambda function can help...

already read a bunch of posts here, as usual.

My meager Python code looks like this:

df['Label'] = df.groupby('Label')['Label'].transform('count')
df['New_Label'] = np.where(df.label <= 2, 'other', df.label)

Upvotes: 1

Views: 79

Answers (1)

Nathaniel
Nathaniel

Reputation: 3290

This code uses pd.DataFrame.where() instead of np.where() and does it in one line:

df.Label = df.Label.where(df.groupby('Label')['Label'].transform('count') > 2, 'other')
print(df)
     Label
ID       
1      cat
2    other
3      cat
4      cat
5    other
6    other

Upvotes: 1

Related Questions