william007
william007

Reputation: 18547

Setting value_counts that lower than a threshold as others

I want to set item with count<=1 as others, code for input table:

import pandas as pd
df=pd.DataFrame({"item":['a','a','a','b','b','c','d']})

input table:

item
0    a
1    a
2    a
3    b
4    b
5    c
6    d

expected output:

  item result
0    a      a
1    a      a
2    a      a
3    b      b
4    b      b
5    c  other
6    d  other

How could I achieve that?

Upvotes: 2

Views: 456

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34086

Use numpy.where with Groupby.transform and Series.le:

In [926]: import numpy as np

In [927]: df['result'] = np.where(df.groupby('item')['item'].transform('count').le(1), 'other', df.item)

In [928]: df
Out[928]: 
  item result
0    a      a
1    a      a
2    a      a
3    b      b
4    b      b
5    c  other
6    d  other

OR use Groupby.size with merge:

In [917]: x = df.groupby('item').size().reset_index()

In [919]: ans = df.merge(x)
In [921]: ans['result'] = np.where(ans[0].le(1), 'other', ans.item)

In [923]: ans = ans.drop(0, 1)

In [924]: ans
Out[924]: 
  item result
0    a      a
1    a      a
2    a      a
3    b      b
4    b      b
5    c  other
6    d  other

Upvotes: 1

jezrael
jezrael

Reputation: 863741

Use Series.where with check if all values are duplciates by Series.duplicated with keep=False:

df['result'] = df.item.where(df.item.duplicated(keep=False), 'other')

Or use GroupBy.transform with greater by 1 by Series.gt:

df['result'] = df.item.where(df.groupby('item')['item'].transform('size').gt(1), 'other')

Or use Series.map with Series.value_counts:

df['result'] = df.item.where(df['item'].map(df['item'].value_counts()).gt(1), 'other')

print (df)
  item result
0    a      a
1    a      a
2    a      a
3    b      b
4    b      b
5    c  other
6    d  other

Upvotes: 2

Related Questions