Reputation: 432
I have the following table of boolean values:
pd.DataFrame(data={'val1': [True, False, False, True],
'val2': [False, True, False, True],
'val3': [True, True, False, True],
'val4': [True, False, True, False],
'val5': [True, True, False, False],
'val6': [False, False, True, True]},
index=pd.Series([1, 2, 3, 4], name='index'))
index | val1 | val2 | val3 | val4 | val5 | val6 |
---|---|---|---|---|---|---|
1 | True | False | True | True | True | False |
2 | False | True | True | False | True | False |
3 | False | False | False | True | False | True |
4 | True | True | True | False | False | True |
I want to create a new dataframe with the same indices, but each row has the first three True column names from the previous column.
index | TrueVal1 | TrueVal2 | TrueVal3 |
---|---|---|---|
1 | val1 | val3 | val4 |
2 | val2 | val3 | val5 |
3 | val4 | val6 | NaN |
4 | val1 | val2 | val3 |
If a row has fewer than three True values, the new dataframe will have Null values.
Upvotes: 3
Views: 497
Reputation: 59519
You can do this with numpy
+ argsort
to slice the columns by the position of the True
values.
Then use where
to NaN
rows that were False
incase there were too few True
values.
import numpy as np
import pandas as pd
# Get the first `N` True columns.
N = 3
arr = df.to_numpy()
data = df.columns.to_numpy()[(~arr).argsort(axis=1, kind='stable')[:, :N]]
mask = np.ones_like(data).cumsum(1) <= arr.sum(1)[:, None]
res = pd.DataFrame(data, columns=[f'TrueVal{i+1}' for i in range(N)],
index=df.index).where(mask)
print(res)
TrueVal1 TrueVal2 TrueVal3
index
1 val1 val3 val4
2 val2 val3 val5
3 val4 val6 NaN
4 val1 val2 val3
Upvotes: 2
Reputation: 323226
Try with dot
, then split
#df = df.set_index('index')
out = df.dot(df.columns + ',').str[:-1].str.split(',',expand=True).iloc[:,:3]
out
Out[258]:
0 1 2
index
1 val1 val3 val4
2 val2 val3 val5
3 val4 val6 None
4 val1 val2 val3
Upvotes: 3