Ank
Ank

Reputation: 1904

Pandas groupby and conditional check on multiple columns

I have a dataframe like so:

id date         status value
1  2009-06-17   1      NaN  
1  2009-07-17   B      NaN 
1  2009-08-17   A      NaN 
1  2009-09-17   5      NaN 
1  2009-10-17   0      0.55
2  2010-07-17   B      NaN 
2  2010-08-17   A      NaN 
2  2010-09-17   0      0.00

Now I want to group by id and then check if value becomes non-zero after status changes to A. So for group with id=1, status does change to A and after(in terms of date) that value also becomes non-zero. But for group with id=2, even after status changes to A, value does not become non-zero. Please note that if status does not change to A then I don't even need to check value.

So finally I want a new dataframe like this:

id check
1  True
2  False

Upvotes: 1

Views: 491

Answers (1)

jezrael
jezrael

Reputation: 862791

Use:

print (df)
   id        date status  value
0   1  2009-06-17      1    NaN
1   1  2009-07-17      B    NaN
2   1  2009-08-17      A    NaN
3   1  2009-09-17      5    NaN
4   1  2009-10-17      0   0.55
5   2  2010-07-17      B    NaN
6   2  2010-08-17      A    NaN
7   2  2010-09-17      0   0.00
8   3  2010-08-17      R    NaN
9   3  2010-09-17      0   0.00

idx = df['id'].unique()
#filter A values
m = df['status'].eq('A')
#filter all rows after A per groups
df1 = df[m.groupby(df['id']).cumsum().gt(0)]
print (df1)
   id        date status  value
2   1  2009-08-17      A    NaN
3   1  2009-09-17      5    NaN
4   1  2009-10-17      0   0.55
6   2  2010-08-17      A    NaN
7   2  2010-09-17      0   0.00

#compare by 0 and test if no 0 value per group and last added all posible id by reindex
df2 = (df1['value'].ne(0)
                   .groupby(df1['id'])
                   .all()
                   .reindex(idx, fill_value=False)
                   .reset_index(name='check'))
print (df2)
   id  check
0   1   True
1   2  False
2   3  False

Upvotes: 2

Related Questions