Pandas 'get_dummies' for specific factors

Question

I have a df like this one:

import pandas as pd

cols = ['id', 'factor_var']
values = [
    [1, 'a'],
    [2, 'a'],
    [3, 'a'],
    [4, 'b'],
    [5, 'b'],
    [6, 'c'],
    [7, 'c'],
    [8, 'c'],
    [9, 'c'],
    [10, 'c'],
    [11, 'd'],
]

df = pd.DataFrame(values, columns=cols)

My target df has the following columns:

target_columns = ['id', 'factor_var_a', 'factor_var_b', 'factor_var_other']

The column factor_var_other being all categories in the factor_var that are not a or b, disregarding the frequency in which each category appears.

Any ideas will be much appreciated.

jezrael · Accepted Answer

You can replace non matched values of list by Series.where, reassign back by DataFrame.assign and last call get_dummies:

s = df['factor_var'].where(df['factor_var'].isin(['a','b']), 'other')
#alternative
#s = df['factor_var'].map({'a':'a','b':'b'}).fillna('other')
df = pd.get_dummies(df.assign(factor_var=s), columns=['factor_var'])
print (df)
    id  factor_var_a  factor_var_b  factor_var_other
0    1             1             0                 0
1    2             1             0                 0
2    3             1             0                 0
3    4             0             1                 0
4    5             0             1                 0
5    6             0             0                 1
6    7             0             0                 1
7    8             0             0                 1
8    9             0             0                 1
9   10             0             0                 1
10  11             0             0                 1

Pandas 'get_dummies' for specific factors

Answers (1)

Related Questions

Pandas &#39;get_dummies&#39; for specific factors

Answers (1)

Related Questions

Pandas 'get_dummies' for specific factors