Alexander Golys
Alexander Golys

Reputation: 703

Transforming column of string into columns of boolean indicators in pandas DataFrame

I have pandas DataFrame with some numerical and some categorical (str) values, let's say this:

   A  B     C  D
0  x  y     a  2
1  x  x    aa  1
2  y  z    aa  4
3  y  z    aa  4
4  x  y  aaaa  0

I want to convert all the categorical value into boolean indicators. Because some of the columns can have the same value names, I want to create names for categorical values to be distinguished, for example columns_name + 'is + value_name.

The expected result is:

   D  A_is_x  A_is_y  B_is_y  B_is_x  B_is_z  C_is_a  C_is_aa  C_is_aaaa
0  2    True   False    True   False   False    True    False      False
1  1    True   False   False    True   False   False     True      False
2  4   False    True   False   False    True   False     True      False
3  4   False    True   False   False    True   False     True      False
4  0    True   False    True   False   False   False    False       True

I wrote some code that works, but it's not very pythonic.

    for col in data.columns:
    if not np.issubdtype(data[col].dtypes, np.number):
        values = data[col].unique()
        for value in values:
            data[col + '_is_' + value] = data[col].map(lambda x: x == value)
        data = data.drop(col, axis=1)

I try to write this using pd.get_dummies, but I have problems with convenient naming the new created columns. Is there any easier and cleaner solution than mine?

I know there were some related questions, but none of them resolve my problem with convenient naming the columns.

Upvotes: 1

Views: 371

Answers (2)

jezrael
jezrael

Reputation: 862511

Use get_dummies with parameters prefix_sep='_is_' and dtype=bool, numeric column is not processing - is first in data like you need:

df = pd.get_dummies(df, prefix_sep='_is_', dtype=bool)

print (df)
   D  A_is_x  A_is_y  B_is_x  B_is_y  B_is_z  C_is_a  C_is_aa  C_is_aaaa
0  2    True   False   False    True   False    True    False      False
1  1    True   False    True   False   False   False     True      False
2  4   False    True   False   False    True   False     True      False
3  4   False    True   False   False    True   False     True      False
4  0    True   False   False    True   False   False    False       True

Upvotes: 2

BENY
BENY

Reputation: 323226

Check get_dummies

df = df[['D']].join(pd.get_dummies(df[['A', 'B', 'C']], prefix_sep='_is_').astype(bool))
df
Out[390]: 
   D  A_is_x  A_is_y  B_is_x  B_is_y  B_is_z  C_is_a  C_is_aa  C_is_aaaa
0  2    True   False   False    True   False    True    False      False
1  1    True   False    True   False   False   False     True      False
2  4   False    True   False   False    True   False     True      False
3  4   False    True   False   False    True   False     True      False
4  0    True   False   False    True   False   False    False       True

Upvotes: 2

Related Questions