samthebrand
samthebrand

Reputation: 3090

How to generate dummy variables for only specific values in a column?

I have a pandas dataframe column filled with country codes for 100 countries. I want to use these to do a regression, but I only want to create dummy variables for specific countries in my dataset.

I thought this would work:

dummies = pd.get_dummies(df.CountryCode, prefix='cc_')
df_and_dummies = pd.concat([df,dummies[dummies['cc_US', 'cc_GB']]], axis=1)
df_and_dummies

But it gives me the error:

KeyError: ('cc_US', 'cc_GB')

My dataframe currently looks something like:

dframe = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
                'CountryCode': ['UK', 'US', 'RU']})
dframe

no dummies

But I want it to look like this:

with dummy variables

Is there a simple way to specify which values you want included in the get_dummies method, or is there another way to identify specific dummy variables?

Upvotes: 1

Views: 3754

Answers (1)

joris
joris

Reputation: 139142

The dummies is looking like this:

In [25]: dummies
Out[25]:
   cc_RU  cc_UK  cc_US
0      0      1      0
1      0      0      1
2      1      0      0

To select certain columns of this, you can provide a list of column names within the [] getitem:

In [27]: dummies[['cc_US', 'cc_UK']]
Out[27]:
   cc_US  cc_UK
0      0      1
1      1      0
2      0      0

So you actually missed just a [ bracket.
Full code becomes then:

In [29]: pd.concat([df, dummies[['cc_US', 'cc_UK']]], axis=1)
Out[29]:
   A  B CountryCode  cc_US  cc_UK
0  a  b          UK      0      1
1  b  a          US      1      0
2  a  c          RU      0      0

Upvotes: 1

Related Questions