Reputation: 77
I have a dataframe where I want to find the count of all ID above a threshold. For example
index DEVICE_ID DIFF
0 12 3
1 12 4
2 12 5
3 12 3
4 13 2
5 13 4
6 13 1
7 14 3
8 14 6
If 'Diff' is greater than or equals to 4, give me the count of the IDs starting from that index for each unqiue ID, so the above dataframe will result in:
{12:3, 13:2, 14:1} - For ID 12, the diff column is 4 on index 1 so we count the amount of 12's from and including index 1 till 3
Sorry for the badly worded question.
Upvotes: 3
Views: 498
Reputation: 75080
Using df.shift()
df['T_F']=(df.DIFF>=4)
df[df.T_F != df.T_F.shift(1)].groupby('DEVICE_ID')['DEVICE_ID'].count().to_dict()
{12: 3, 13: 2, 14: 1}
Upvotes: 2
Reputation: 323226
Using cumprod
s=df.DIFF.lt(4).astype(int).groupby(df['DEVICE_ID']).cumprod()
s=(1-s).groupby(df['DEVICE_ID']).sum()
s
DEVICE_ID
12 3
13 2
14 1
Name: DIFF, dtype: int32
Upvotes: 3
Reputation: 862611
Compare column by Series.ge
(>=
) first, then grouping by df['DEVICE_ID']
and use cumsum
, compare by Series.gt
and aggregate sum
for count True
values:
s = df['DIFF'].ge(4).groupby(df['DEVICE_ID']).cumsum().gt(0).astype(int)
out = s.groupby(df['DEVICE_ID']).sum().to_dict()
print (out)
{12: 3, 13: 2, 14: 1}
Detail:
print (df['DIFF'].ge(4).groupby(df['DEVICE_ID']).cumsum())
index
0 0.0
1 1.0
2 2.0
3 2.0
4 0.0
5 1.0
6 1.0
7 0.0
8 1.0
Name: DIFF, dtype: float64
Another solution with index by DEVICE_ID
, then gro by index with level=0
and last use only sum
per index (level=0
):
out = (df.set_index(['DEVICE_ID'])['DIFF']
.ge(4)
.groupby(level=0)
.cumsum()
.gt(0)
.astype(int)
.sum(level=0)
.to_dict())
Upvotes: 3