Reputation: 103
I would like to maintain the multilevel structure of my columns while applying get_dummies()
to particular subcolumns.
For example, given the dataframe:
In [1]: df = pd.DataFrame({('A','one'):['a','a','b'],
('A','two'):['b','a','a'],
('B','one'):['b','b','a'],
('B','two'):['a','a','a'],
('C','one'):['b','a','b'],
('C','two'):['a','b','a'],})
df
Out[1]:
A B C
one two one two one two
0 a b b a b a
1 a a b a a b
2 b a a a b a
I'd like to produce something along the lines of the following:
A B C
one_a one_b two one_a one_b two one_a one_b two
0 1 0 b 0 1 a 0 1 a
1 1 0 a 0 1 a 1 0 b
2 0 1 a 1 0 a 0 1 a
How can I produce a result similar to the one above? How do I encode a subcolumn as a one-hot vector without affecting the multilevel structure of the dataframe?
I have tried the code below, and I understand why it does not work. I cannot insert two columns in place of one.
In [2]: df.loc[:, (slice(None),'one')] = pd.get_dummies(df.loc[:, (slice(None),'one')])
df
Out[2]:
A B C
one two one two one two
0 NaN b NaN a NaN a
1 NaN a NaN a NaN b
2 NaN a NaN a NaN a
I know I could also use drop_first=True
with get_dummies()
, but this would give me one column instead of two and would only work for binary variables.
Upvotes: 2
Views: 146
Reputation: 294258
Panda-fu
pd.get_dummies(df.stack(0).one, prefix='one').stack().unstack(0).T.join(
df.xs('two', axis=1, level=1, drop_level=False)
).sort_index(1)
A B C
one_a one_b two one_a one_b two one_a one_b two
0 1 0 b 0 1 a 0 1 a
1 1 0 a 0 1 a 1 0 b
2 0 1 a 1 0 a 0 1 a
Alternative
def f(d, n, k):
d = d[n]
o = d.pop(k)
return pd.get_dummies(o, prefix=k).join(d)
pd.concat({n: f(d, n, 'one') for n, d in df.groupby(axis=1, level=0)}, axis=1)
A B C
one_a one_b two one_a one_b two one_a one_b two
0 1 0 b 0 1 a 0 1 a
1 1 0 a 0 1 a 1 0 b
2 0 1 a 1 0 a 0 1 a
Upvotes: 3