MaMo
MaMo

Reputation: 585

Python: use apply on groups separatly after grouping dataframe

My data frame looks like this:

┌────┬──────┬──────┐
│ No │ col1 │ col2 │
├────┼──────┼──────┤
│  1 │ A    │  5.0 │
│  1 │ B1   │ 10.0 │
│  1 │ B2   │ 20.0 │
│  2 │ A    │  0.0 │
│  2 │ B1   │  0.0 │
│  2 │ C1   │  0.0 │
│  3 │ A    │  0.0 │
│  3 │ B1   │  5.0 │
│  3 │ C1   │ 20.0 │
│  3 │ C2   │ 30.0 │
└────┴──────┴──────┘

First I used groupby to group the data frame by column No.

I would like to do three things now:

  1. get a list of values from column No where col2 == 0.0 in all rows of this group (in this case No.2)
  2. get a list of No's where col2 != 0.0 for col1 == 'A' but at least one other row of the group has col2 == 0.0 (in this case No.3)
  3. get a list of No's where minimum 1 row contains col2 == 0.0 (No.2 and 3)

Sorry for asking three issues at once. Hope that is ok.

Thank you:)

Upvotes: 1

Views: 45

Answers (1)

jezrael
jezrael

Reputation: 862661

You can use:

g = df['col2'].eq(0).groupby(df['No'])
a = g.all()
a = a.index[a].tolist()
print (a)
[2]

b1 = (df['col2'].ne(0) & df['col1'].eq('A')).groupby(df['No']).any()
b2 = (df['col2'].eq(0) & df['col1'].ne('A')).groupby(df['No']).any()
b = b1 & b2
b = b.index[b].tolist()
print (b)
[]

c = g.any()
c = c.index[c].tolist()
print (c)
[2,3]

Another solution should be custom function for return boolean DataFrame and final create dictionary with 3 lists:

def f(x):
    a = x['col2'].eq(0)
    b1 = x['col2'].ne(0) & x['col1'].eq('A')
    b2 = a & x['col1'].ne('A')
    b = b1.any() & b2.any()

    return pd.Series([a.all(), b, a.any()], index=list('abc'))

m = df.groupby('No').apply(f)
print (m)
        a      b      c
No                     
1   False  False  False
2    True  False   True
3   False  False   True

fin = {x: m[x].index[m[x]].tolist() for x in m.columns}
print (fin)
{'a': [2], 'b': [], 'c': [2, 3]}

Upvotes: 1

Related Questions