Reputation: 703
I have pandas DataFrame with some numerical and some categorical (str) values, let's say this:
A B C D
0 x y a 2
1 x x aa 1
2 y z aa 4
3 y z aa 4
4 x y aaaa 0
I want to convert all the categorical value into boolean indicators. Because some of the columns can have the same value names, I want to create names for categorical values to be distinguished, for example columns_name + 'is + value_name.
The expected result is:
D A_is_x A_is_y B_is_y B_is_x B_is_z C_is_a C_is_aa C_is_aaaa
0 2 True False True False False True False False
1 1 True False False True False False True False
2 4 False True False False True False True False
3 4 False True False False True False True False
4 0 True False True False False False False True
I wrote some code that works, but it's not very pythonic.
for col in data.columns:
if not np.issubdtype(data[col].dtypes, np.number):
values = data[col].unique()
for value in values:
data[col + '_is_' + value] = data[col].map(lambda x: x == value)
data = data.drop(col, axis=1)
I try to write this using pd.get_dummies, but I have problems with convenient naming the new created columns. Is there any easier and cleaner solution than mine?
I know there were some related questions, but none of them resolve my problem with convenient naming the columns.
Upvotes: 1
Views: 371
Reputation: 862511
Use get_dummies
with parameters prefix_sep='_is_'
and dtype=bool
, numeric column is not processing - is first in data like you need:
df = pd.get_dummies(df, prefix_sep='_is_', dtype=bool)
print (df)
D A_is_x A_is_y B_is_x B_is_y B_is_z C_is_a C_is_aa C_is_aaaa
0 2 True False False True False True False False
1 1 True False True False False False True False
2 4 False True False False True False True False
3 4 False True False False True False True False
4 0 True False False True False False False True
Upvotes: 2
Reputation: 323226
Check get_dummies
df = df[['D']].join(pd.get_dummies(df[['A', 'B', 'C']], prefix_sep='_is_').astype(bool))
df
Out[390]:
D A_is_x A_is_y B_is_x B_is_y B_is_z C_is_a C_is_aa C_is_aaaa
0 2 True False False True False True False False
1 1 True False True False False False True False
2 4 False True False False True False True False
3 4 False True False False True False True False
4 0 True False False True False False False True
Upvotes: 2