Martin Thoma
Martin Thoma

Reputation: 136187

How do I create dummy variables for a subset of a categorical variable?

Example

>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> s
0    a
1    b
2    c
3    a
dtype: object
>>> pd.get_dummies(s)
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0

Now I would like to map a and b to a dummy variable, but nothing else. How can I do that?

What I tried

>>> pd.get_dummies(s, columns=['a', 'b'])
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0

Upvotes: 1

Views: 444

Answers (3)

Zero
Zero

Reputation: 76917

Another way

In [3907]: pd.DataFrame({c:s.eq(c).astype(int) for c in ['a', 'b']})
Out[3907]:
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

Or, (s==c).astype(int)

Upvotes: 0

EdChum
EdChum

Reputation: 393903

A simpler method is to just mask the resultant df with the cols of interest:

In[16]:
pd.get_dummies(s)[list('ab')]

Out[16]: 
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

So this will sub-select the resultant dummies df with the cols of interest

If you don't want to calculate the dummies column for the columns that you are not interested in the first place, then you could filter out the rows of interest but this requires reindexing with a fill_value (thanks to @jezrael for the suggestion):

In[20]:
pd.get_dummies(s[s.isin(list('ab'))]).reindex(s.index, fill_value=0)

Out[20]: 
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

Upvotes: 3

Martin Thoma
Martin Thoma

Reputation: 136187

Setting everything else to nan is one option:

s[~((s == 'a') | (s == 'b'))] = float('nan')

which yields:

>>> pd.get_dummies(s)
   a  b
0  1  0
1  0  1
2  0  0
3  1  0

Upvotes: 0

Related Questions