Bruno Mello
Bruno Mello

Reputation: 4618

Get name of colums where true in pandas dataframe

Suppose I have the following pandas dataframe:

df = pd.DataFrame({'id': [1,2,3,4,5], 'a': [True, True, False, True, False], 'b': [False, True, False, False, True], 'c': [False, False, True, True, True]})

 id      a      b      c
  1   True  False  False
  2   True   True  False
  3  False  False   True
  4   True  False   True
  5  False   True   True

And I want, for each id, get the name of the columns where True, the final dict would be:

{1: ['a'], 2: ['a', 'b'], 3: ['c'], 4: ['a', 'c'], 5: ['b', 'c']}

I think maybe it is possible with a group by replacing True with the name of the column and them aggregating it with list but I couldn't come up with a solution.

EDIT: If it is all false for an id, then just return an empty list.

Upvotes: 4

Views: 967

Answers (3)

sammywemmy
sammywemmy

Reputation: 28699

You could try setting id as the index, stacking, filtering for only True values, then, using a defaultdict, get your dictionary:

box = df.set_index("id").stack().loc[lambda x: x].index
from collections import defaultdict

d = defaultdict(list)
for k, v in box:
    d[k].append(v)

d

defaultdict(list,
            {1: ['a'], 2: ['a', 'b'], 3: ['c'], 4: ['a', 'c'], 5: ['b', 'c']})

I reckon that it may be a bit faster than @BEN_YO's awesome solution. As always, your mileage may vary.

If the id has all False:

df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5],
        "a": [True, True, False, True, False],
        "b": [False, True, False, False, False],
        "c": [False, False, True, True, False],
    }
)
df

    id     a       b    c
0   1   True    False   False
1   2   True    True    False
2   3   False   False   True
3   4   True    False   True
4   5   False   False   False

You can an extra step, after running the code above:

{k: d[k] for k in df.id}

{1: ['a'], 2: ['a', 'b'], 3: ['c'], 4: ['a', 'c'], 5: []}

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150765

Another way is good (???) old apply, then to_dict

df.set_index('id').apply(lambda x: list(x[x].index), axis=1).to_dict()

Output:

{1: ['a'], 2: ['a', 'b'], 3: ['c'], 4: ['a', 'c'], 5: ['b', 'c']}

Upvotes: 2

BENY
BENY

Reputation: 323306

Try with melt then groupby

out_d = df.melt('id').query('value').groupby('id')['variable'].agg(list).to_dict()
Out[127]: {1: ['a'], 2: ['a', 'b'], 3: ['c'], 4: ['a', 'c'], 5: ['b', 'c']}

Upvotes: 5

Related Questions