Reputation: 3090
I have a pandas dataframe column filled with country codes for 100 countries. I want to use these to do a regression, but I only want to create dummy variables for specific countries in my dataset.
I thought this would work:
dummies = pd.get_dummies(df.CountryCode, prefix='cc_')
df_and_dummies = pd.concat([df,dummies[dummies['cc_US', 'cc_GB']]], axis=1)
df_and_dummies
But it gives me the error:
KeyError: ('cc_US', 'cc_GB')
My dataframe currently looks something like:
dframe = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
'CountryCode': ['UK', 'US', 'RU']})
dframe
But I want it to look like this:
Is there a simple way to specify which values you want included in the get_dummies
method, or is there another way to identify specific dummy variables?
Upvotes: 1
Views: 3754
Reputation: 139142
The dummies is looking like this:
In [25]: dummies
Out[25]:
cc_RU cc_UK cc_US
0 0 1 0
1 0 0 1
2 1 0 0
To select certain columns of this, you can provide a list of column names within the [] getitem:
In [27]: dummies[['cc_US', 'cc_UK']]
Out[27]:
cc_US cc_UK
0 0 1
1 1 0
2 0 0
So you actually missed just a [ bracket.
Full code becomes then:
In [29]: pd.concat([df, dummies[['cc_US', 'cc_UK']]], axis=1)
Out[29]:
A B CountryCode cc_US cc_UK
0 a b UK 0 1
1 b a US 1 0
2 a c RU 0 0
Upvotes: 1