prp
prp

Reputation: 962

Pandas 'get_dummies' for specific factors

I have a df like this one:

import pandas as pd

cols = ['id', 'factor_var']
values = [
    [1, 'a'],
    [2, 'a'],
    [3, 'a'],
    [4, 'b'],
    [5, 'b'],
    [6, 'c'],
    [7, 'c'],
    [8, 'c'],
    [9, 'c'],
    [10, 'c'],
    [11, 'd'],
]

df = pd.DataFrame(values, columns=cols)

My target df has the following columns:

target_columns = ['id', 'factor_var_a', 'factor_var_b', 'factor_var_other']

The column factor_var_other being all categories in the factor_var that are not a or b, disregarding the frequency in which each category appears.

Any ideas will be much appreciated.

Upvotes: 1

Views: 22

Answers (1)

jezrael
jezrael

Reputation: 862471

You can replace non matched values of list by Series.where, reassign back by DataFrame.assign and last call get_dummies:

s = df['factor_var'].where(df['factor_var'].isin(['a','b']), 'other')
#alternative
#s = df['factor_var'].map({'a':'a','b':'b'}).fillna('other')
df = pd.get_dummies(df.assign(factor_var=s), columns=['factor_var'])
print (df)
    id  factor_var_a  factor_var_b  factor_var_other
0    1             1             0                 0
1    2             1             0                 0
2    3             1             0                 0
3    4             0             1                 0
4    5             0             1                 0
5    6             0             0                 1
6    7             0             0                 1
7    8             0             0                 1
8    9             0             0                 1
9   10             0             0                 1
10  11             0             0                 1

Upvotes: 1

Related Questions