Reputation: 136187
>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> s
0 a
1 b
2 c
3 a
dtype: object
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
Now I would like to map a
and b
to a dummy variable, but nothing else. How can I do that?
>>> pd.get_dummies(s, columns=['a', 'b'])
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
Upvotes: 1
Views: 444
Reputation: 76917
Another way
In [3907]: pd.DataFrame({c:s.eq(c).astype(int) for c in ['a', 'b']})
Out[3907]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
Or, (s==c).astype(int)
Upvotes: 0
Reputation: 393903
A simpler method is to just mask the resultant df with the cols of interest:
In[16]:
pd.get_dummies(s)[list('ab')]
Out[16]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
So this will sub-select the resultant dummies df with the cols of interest
If you don't want to calculate the dummies column for the columns that you are not interested in the first place, then you could filter out the rows of interest but this requires reindex
ing with a fill_value
(thanks to @jezrael for the suggestion):
In[20]:
pd.get_dummies(s[s.isin(list('ab'))]).reindex(s.index, fill_value=0)
Out[20]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
Upvotes: 3
Reputation: 136187
Setting everything else to nan is one option:
s[~((s == 'a') | (s == 'b'))] = float('nan')
which yields:
>>> pd.get_dummies(s)
a b
0 1 0
1 0 1
2 0 0
3 1 0
Upvotes: 0