Fra
Fra

Reputation: 5188

pandas merge columns to a single time series

I have a data frame with 3 boolean columns:

    A     B     C
0   True  False False
1   False True  False
2   True  Nan   False
3   False False True
...

Only one column is true at each time, but there can be Nan.

I would like to get a list of column names where the name is chosen based on the boolean. So for the example above:

['A', 'B', 'A', 'C']

it's a simple matrix operation, not sure how to map it to pandas...

Upvotes: 3

Views: 265

Answers (2)

Zeugma
Zeugma

Reputation: 32095

You can use the mul operator between the dataframe and the dataframe columns. That results in True cells containing the column name and False cells empty. Eventually you can just sum the row data:

df.mul(df.columns).sum(axis=1)
Out[44]: 
0    A
1    B
2    A
3    C

Upvotes: 2

alko
alko

Reputation: 48297

You can index columns names, i.e. df.columns, with proper indexes:

>>> import numpy as np
>>> df.columns[(df * np.arange(df.values.shape[1])).sum(axis=1)]
Index([u'A', u'B', u'A', u'C'], dtype=object)

Explanation.

Expression

>>> df * np.arange(df.values.shape[1])
   A  B  C
0  0  0  0
1  0  1  0
2  0  0  0
3  0  0  2

calculates for each column a proper index, then matrix is summed row-wize with

>>> (df * np.arange(df.values.shape[1])).sum(axis=1)
0    0
1    1
2    0
3    2
dtype: int32

Upvotes: 0

Related Questions