Reputation: 136187

How do I create dummy variables for a subset of a categorical variable?

Example

>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> s
0    a
1    b
2    c
3    a
dtype: object
>>> pd.get_dummies(s)
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0

Now I would like to map a and b to a dummy variable, but nothing else. How can I do that?

What I tried

>>> pd.get_dummies(s, columns=['a', 'b'])
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0

Upvotes: 1

Answers (3)

Zero

Reputation: 76917

Another way

In [3907]: pd.DataFrame({c:s.eq(c).astype(int) for c in ['a', 'b']})
Out[3907]:
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

Or, (s==c).astype(int)

Upvotes: 0

EdChum

Reputation: 393903

A simpler method is to just mask the resultant df with the cols of interest:

In[16]:
pd.get_dummies(s)[list('ab')]

Out[16]: 
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

So this will sub-select the resultant dummies df with the cols of interest

If you don't want to calculate the dummies column for the columns that you are not interested in the first place, then you could filter out the rows of interest but this requires reindexing with a fill_value (thanks to @jezrael for the suggestion):

In[20]:
pd.get_dummies(s[s.isin(list('ab'))]).reindex(s.index, fill_value=0)

Out[20]: 
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

Upvotes: 3

Martin Thoma

Reputation: 136187

Setting everything else to nan is one option:

s[~((s == 'a') | (s == 'b'))] = float('nan')

which yields:

>>> pd.get_dummies(s)
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

Upvotes: 0

How do I create dummy variables for a subset of a categorical variable?

Example

What I tried

Answers (3)

Related Questions