Harish
Harish

Reputation: 49

Creating python function to create categorical bins in pandas

I'm trying to create a reusable function in python 2.7(pandas) to form categorical bins, i.e. group less-value categories as 'other'. Can someone help me to create a function for the below: col1, col2, etc. are different categorical variable columns.

##Reducing categories by binning categorical variables - column1
a = df.col1.value_counts()
#get top 5 values of index
vals = a[:5].index
df['col1_new'] = df.col1.where(df.col1.isin(vals), 'other')
df = df.drop(['col1'],axis=1)

##Reducing categories by binning categorical variables - column2
a = df.col2.value_counts()
#get top 6 values of index
vals = a[:6].index
df['col2_new'] = df.col2.where(df.col2.isin(vals), 'other')
df = df.drop(['col2'],axis=1)

Upvotes: 1

Views: 2172

Answers (1)

jezrael
jezrael

Reputation: 863801

You can use:

df = pd.DataFrame({'A':list('abcdefabcdefabffeg'),
                   'D':[1,3,5,7,1,0,1,3,5,7,1,0,1,3,5,7,1,0]})

print (df)
    A  D
0   a  1
1   b  3
2   c  5
3   d  7
4   e  1
5   f  0
6   a  1
7   b  3
8   c  5
9   d  7
10  e  1
11  f  0
12  a  1
13  b  3
14  f  5
15  f  7
16  e  1
17  g  0

def replace_under_top(df, c, n):
    a = df[c].value_counts()
    #get top n values of index
    vals = a[:n].index
    #assign columns back
    df[c] = df[c].where(df[c].isin(vals), 'other')
    #rename processes column
    df = df.rename(columns={c : c + '_new'})
    return df

Test:

df1 = replace_under_top(df, 'A', 3)
print (df1)
    A_new  D
0   other  1
1       b  3
2   other  5
3   other  7
4       e  1
5       f  0
6   other  1
7       b  3
8   other  5
9   other  7
10      e  1
11      f  0
12  other  1
13      b  3
14      f  5
15      f  7
16      e  1
17  other  0

df2 = replace_under_top(df, 'D', 4)
print (df2)
        A  D_new
0   other      1
1       b      3
2   other      5
3   other      7
4       e      1
5       f  other
6   other      1
7       b      3
8   other      5
9   other      7
10      e      1
11      f  other
12  other      1
13      b      3
14      f      5
15      f      7
16      e      1
17  other  other

Upvotes: 3

Related Questions