Reputation: 1754

How to create dummies for certain columns with pandas.get_dummies()

df = pd.DataFrame({'A': ['x', 'y', 'x'], 'B': ['z', 'u', 'z'],
                  'C': ['1', '2', '3'],
                  'D':['j', 'l', 'j']})

I just want Column A and D to get dummies not for Column B. If I used pd.get_dummies(df), all columns turned into dummies.

I want the final result containing all of columns , which means column C and column B exit,like 'A_x','A_y','B','C','D_j','D_l'.

Upvotes: 46

Answers (4)

Trenton McKinney

Reputation: 62383

The other answers are great for the specific example in the OP
This answer is for cases where there may be many columns, and it's too cumbersome to type out all the column names
This is a non-exhaustive solution to specifying many different columns to get_dummies while excluding some columns.
Using the built-in filter() function on df.columns is also an option.
pd.get_dummies only works on columns with an object dtype when columns=None.
- Another potential option is to set only columns to be transformed with the object dtype, and make sure the columns that shouldn't be transformed, are not object dtype.
Using set(), as shown in this answer, is yet another option.

import pandas as pd
import string  # for data
import numpy as np

# create test data
np.random.seed(15)
df = pd.DataFrame(np.random.randint(1, 4, size=(5, 10)), columns=list(string.ascii_uppercase[:10]))

# display(df)
   A  B  C  D  E  F  G  H  I  J
0  1  2  1  2  1  1  2  3  2  2
1  2  1  3  3  1  2  2  1  2  1
2  2  3  1  3  2  2  1  2  3  3
3  3  2  1  2  3  2  3  1  3  1
4  1  1  1  3  3  1  2  1  2  1

Option 1

If the excluded columns are fewer than the included columns, specify the columns to remove, and then use a list comprehension to remove them from the list being passed to the columns= parameter.

# columns not to transform
not_cols = ['C', 'G']

# get dummies
df_dummies = pd.get_dummies(data=df, columns=[col for col in df.columns if col not in not_cols])

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

Option 2

If the columns to remove are at the beginning or end, slice df.columns

df_dummies = pd.get_dummies(data=df, columns=df.columns[2:])

   A  B  C_1  C_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  G_1  G_2  G_3  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    1    0    1    0    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  2  1    0    1    0    1    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0
2  2  3    1    0    0    1    0    1    0    0    1    1    0    0    0    1    0    0    1    0    0    1
3  3  2    1    0    1    0    0    0    1    0    1    0    0    1    1    0    0    0    1    1    0    0
4  1  1    1    0    0    1    0    0    1    1    0    0    1    0    1    0    0    1    0    1    0    0

Option 3

Specify slices and then concat the excluded columns to the dummies
- Uses pd.concat, similar to this answer, but with more columns.
np.r_ translates slice objects to concatenate

slices = np.r_[slice(0, 2), slice(3, 6), slice(7, 10)]
excluded = [2, 6]

df_dummies = pd.concat([df.iloc[:, excluded], pd.get_dummies(data=df.iloc[:, slices].astype(object))], axis=1)

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

Upvotes: 3

Patric Fulop

Reputation: 290

Adding to the above perfect answers, in case you have a big dataset with lots of attributes, if you don't want to specify by hand all of the dummies you want, you can do set differences:

len(df.columns) = 50
non_dummy_cols = ['A','B','C'] 
# Takes all 47 other columns
dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)

Upvotes: 19

knagaev

Reputation: 2957

It can be done without concatenation, using get_dummies() with required parameters

In [294]: pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])
Out[294]: 
   B  C  A_x  A_y  D_j  D_l
0  z  1  1.0  0.0  1.0  0.0
1  u  2  0.0  1.0  0.0  1.0
2  z  3  1.0  0.0  1.0  0.0

Upvotes: 80

Stefan

Reputation: 42885

Just select the two columns you want to .get_dummies() for - column names indicate source column and variable label represented as binary variable, and pd.concat() the original columns you want unchanged:

pd.concat([pd.get_dummies(df[['A', 'D']]), df[['B', 'C']]], axis=1)

   A_x  A_y  D_j  D_l  B  C
0  1.0  0.0  1.0  0.0  z  1
1  0.0  1.0  0.0  1.0  u  2
2  1.0  0.0  1.0  0.0  z  3

Upvotes: 8

How to create dummies for certain columns with pandas.get_dummies()

Answers (4)

Option 1

Option 2

Option 3

Related Questions