samari708
samari708

Reputation: 77

Count of dataframe column above a threshold

I have a dataframe where I want to find the count of all ID above a threshold. For example

  index  DEVICE_ID DIFF
   0         12     3
   1         12     4
   2         12     5
   3         12     3
   4         13     2
   5         13     4
   6         13     1
   7         14     3
   8         14     6

If 'Diff' is greater than or equals to 4, give me the count of the IDs starting from that index for each unqiue ID, so the above dataframe will result in:

  {12:3, 13:2, 14:1} - For ID 12, the diff column is 4 on index 1 so we count the amount of 12's from and including index 1 till 3

Sorry for the badly worded question.

Upvotes: 3

Views: 498

Answers (3)

anky
anky

Reputation: 75080

Using df.shift()

df['T_F']=(df.DIFF>=4)
df[df.T_F != df.T_F.shift(1)].groupby('DEVICE_ID')['DEVICE_ID'].count().to_dict()

{12: 3, 13: 2, 14: 1}

Upvotes: 2

BENY
BENY

Reputation: 323226

Using cumprod

s=df.DIFF.lt(4).astype(int).groupby(df['DEVICE_ID']).cumprod()
s=(1-s).groupby(df['DEVICE_ID']).sum()
s
DEVICE_ID
12    3
13    2
14    1
Name: DIFF, dtype: int32

Upvotes: 3

jezrael
jezrael

Reputation: 862611

Compare column by Series.ge (>=) first, then grouping by df['DEVICE_ID'] and use cumsum, compare by Series.gt and aggregate sum for count True values:

s = df['DIFF'].ge(4).groupby(df['DEVICE_ID']).cumsum().gt(0).astype(int)

out = s.groupby(df['DEVICE_ID']).sum().to_dict()
print (out)
{12: 3, 13: 2, 14: 1}

Detail:

print (df['DIFF'].ge(4).groupby(df['DEVICE_ID']).cumsum())
index
0    0.0
1    1.0
2    2.0
3    2.0
4    0.0
5    1.0
6    1.0
7    0.0
8    1.0
Name: DIFF, dtype: float64

Another solution with index by DEVICE_ID, then gro by index with level=0 and last use only sum per index (level=0):

out = (df.set_index(['DEVICE_ID'])['DIFF']
         .ge(4)
         .groupby(level=0)
         .cumsum()
         .gt(0)
         .astype(int)
         .sum(level=0)
         .to_dict())

Upvotes: 3

Related Questions