ely
ely

Reputation: 77424

Python Pandas: converting several boolean columns into a (possibly repeated) column made up of the boolean column names

Suppose I have the DataFrame below:

>>> dfrm = pandas.DataFrame({
                             "A":[1,2,3], 
                             "id1":[True, True, False], 
                             "id2":[False, True, False]
                            })

>>> dfrm
   A    id1    id2
0  1   True  False
1  2   True   True
2  3  False  False

How can I flatten the two Boolean columns into a new column (that possibly will cause rows of the DataFrame to need to be repeated), such that in the new column, the entries for all of the True occurrences appear.

Specifically, in the example above, I would want the output to look like this:

index A   id1    id2   all_ids
    0 1  True  False       id1
    1 2  True   True       id1
    1 2  True   True       id2
    2 3 False  False       NaN

(preferably not multi-indexed on all_ids but I would take that if it was the only way to do it).

I've commonly seen this as "wide to long" and the inverse (going from one column to a bunch of Booleans) as "long to wide".

Is there any built-in support for this in Pandas?

Upvotes: 1

Views: 2366

Answers (1)

Chang She
Chang She

Reputation: 16960

Off-hand I can't recall a function that does this in pandas as a one-liner, but you can do something like this:

In [35]: st = dfrm.ix[:, ['id1', 'id2']].stack()

In [36]: all_ids = Series(st.index.get_level_values(1), 
                          st.index.get_level_values(0),
                          name='all_ids')[st.values]

In [37]: dfrm.join(all_ids, how='left')
Out[37]: 
   A    id1    id2 all_ids
0  1   True  False     id1
1  2   True   True     id1
1  2   True   True     id2
2  3  False  False     NaN

Upvotes: 2

Related Questions