Reputation: 77424
Suppose I have the DataFrame below:
>>> dfrm = pandas.DataFrame({
"A":[1,2,3],
"id1":[True, True, False],
"id2":[False, True, False]
})
>>> dfrm
A id1 id2
0 1 True False
1 2 True True
2 3 False False
How can I flatten the two Boolean columns into a new column (that possibly will cause rows of the DataFrame to need to be repeated), such that in the new column, the entries for all of the True occurrences appear.
Specifically, in the example above, I would want the output to look like this:
index A id1 id2 all_ids
0 1 True False id1
1 2 True True id1
1 2 True True id2
2 3 False False NaN
(preferably not multi-indexed on all_ids
but I would take that if it was the only way to do it).
I've commonly seen this as "wide to long" and the inverse (going from one column to a bunch of Booleans) as "long to wide".
Is there any built-in support for this in Pandas?
Upvotes: 1
Views: 2366
Reputation: 16960
Off-hand I can't recall a function that does this in pandas as a one-liner, but you can do something like this:
In [35]: st = dfrm.ix[:, ['id1', 'id2']].stack()
In [36]: all_ids = Series(st.index.get_level_values(1),
st.index.get_level_values(0),
name='all_ids')[st.values]
In [37]: dfrm.join(all_ids, how='left')
Out[37]:
A id1 id2 all_ids
0 1 True False id1
1 2 True True id1
1 2 True True id2
2 3 False False NaN
Upvotes: 2