Reputation: 5188
I have a data frame with 3 boolean columns:
A B C
0 True False False
1 False True False
2 True Nan False
3 False False True
...
Only one column is true at each time, but there can be Nan.
I would like to get a list of column names where the name is chosen based on the boolean. So for the example above:
['A', 'B', 'A', 'C']
it's a simple matrix operation, not sure how to map it to pandas...
Upvotes: 3
Views: 265
Reputation: 32095
You can use the mul
operator between the dataframe and the dataframe columns. That results in True
cells containing the column name and False
cells empty. Eventually you can just sum the row data:
df.mul(df.columns).sum(axis=1)
Out[44]:
0 A
1 B
2 A
3 C
Upvotes: 2
Reputation: 48297
You can index columns names, i.e. df.columns
, with proper indexes:
>>> import numpy as np
>>> df.columns[(df * np.arange(df.values.shape[1])).sum(axis=1)]
Index([u'A', u'B', u'A', u'C'], dtype=object)
Explanation.
Expression
>>> df * np.arange(df.values.shape[1])
A B C
0 0 0 0
1 0 1 0
2 0 0 0
3 0 0 2
calculates for each column a proper index, then matrix is summed row-wize with
>>> (df * np.arange(df.values.shape[1])).sum(axis=1)
0 0
1 1
2 0
3 2
dtype: int32
Upvotes: 0