Reputation: 962
I have a df
like this one:
import pandas as pd
cols = ['id', 'factor_var']
values = [
[1, 'a'],
[2, 'a'],
[3, 'a'],
[4, 'b'],
[5, 'b'],
[6, 'c'],
[7, 'c'],
[8, 'c'],
[9, 'c'],
[10, 'c'],
[11, 'd'],
]
df = pd.DataFrame(values, columns=cols)
My target df
has the following columns:
target_columns = ['id', 'factor_var_a', 'factor_var_b', 'factor_var_other']
The column factor_var_other
being all categories in the factor_var
that are not a
or b
, disregarding the frequency in which each category appears.
Any ideas will be much appreciated.
Upvotes: 1
Views: 22
Reputation: 862471
You can replace non matched values of list by Series.where
, reassign back by DataFrame.assign
and last call get_dummies
:
s = df['factor_var'].where(df['factor_var'].isin(['a','b']), 'other')
#alternative
#s = df['factor_var'].map({'a':'a','b':'b'}).fillna('other')
df = pd.get_dummies(df.assign(factor_var=s), columns=['factor_var'])
print (df)
id factor_var_a factor_var_b factor_var_other
0 1 1 0 0
1 2 1 0 0
2 3 1 0 0
3 4 0 1 0
4 5 0 1 0
5 6 0 0 1
6 7 0 0 1
7 8 0 0 1
8 9 0 0 1
9 10 0 0 1
10 11 0 0 1
Upvotes: 1