Reputation: 647

Pandas Count number of previous rows based on column value and condition

In Pandas I am trying create a column that counts all previous rows based on two column values.

The idea is to sort the data by animal and by date and then use pandas shift function to calculate the variance between an animals current date and previous date(the sample data below already has this done).

Then I want to create a new column called Previous_Count that is a rolling count of previous records based on value in Animal column and resets if the Date_Variance field is larger than 10.

Sample Data Code:

import pandas as pd

data = {'Date': ['2021-01-01','2021-01-05','2021-01-10','2021-01-11','2021-01-04','2021-01-08','2021-01-20','2021-01-21','2021-01-25','2021-02-12','2021-02-14'],
        'Date_Variance': [0,4,5,1,0,4,0,1,4,18,2],
        'Animal': ['Dog','Dog','Dog','Dog','Cat','Cat','Lizard','Lizard','Lizard','Lizard','Lizard'],
        #'Previous_count': [0,1,2,3,0,1,0,1,2,0,1]
        }

df = pd.DataFrame(data)

Sample Data Output with Previous_Count column:

Date       Date_Variance    Animal  Previous_count
1/1/2021    0               Dog     0
1/5/2021    4               Dog     1
1/10/2021   5               Dog     2
1/11/2021   1               Dog     3
1/4/2021    0               Cat     0
1/8/2021    4               Cat     1
1/20/2021   0               Lizard  0
1/21/2021   1               Lizard  1
1/25/2021   4               Lizard  2
2/12/2021   18              Lizard  0
2/14/2021   2               Lizard  1

Upvotes: 1

Answers (2)

ansev

Reputation: 30920

Use GroupBy.cumcount. Here we use Series.cumsum to get a new group whenever Date_Variance is greater than 10

df['Previous_count'] = df.groupby(['Animal',
                                   df['Date_Variance'].gt(10).cumsum()]).cumcount()

print(df)

          Date  Date_Variance  Animal  Previous_count
0   2021-01-01              0     Dog               0
1   2021-01-05              4     Dog               1
2   2021-01-10              5     Dog               2
3   2021-01-11              1     Dog               3
4   2021-01-04              0     Cat               0
5   2021-01-08              4     Cat               1
6   2021-01-20              0  Lizard               0
7   2021-01-21              1  Lizard               1
8   2021-01-25              4  Lizard               2
9   2021-02-12             18  Lizard               0
10  2021-02-14              2  Lizard               1

print(df['Date_Variance'].gt(10).cumsum())
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     1
10    1
Name: Date_Variance, dtype: int64

Upvotes: 1

not_speshal

Reputation: 23146

Try:

grouper = df.groupby("Animal")["Date_Variance"].apply(lambda x: x.gt(10).cumsum())
df["Previous_count"] = df.groupby(["Animal", grouper]).cumcount()

>>> df
          Date  Date_Variance  Animal  Previous_count
0   2021-01-01              0     Dog               0
1   2021-01-05              4     Dog               1
2   2021-01-10              5     Dog               2
3   2021-01-11              1     Dog               3
4   2021-01-04              0     Cat               0
5   2021-01-08              4     Cat               1
6   2021-01-20              0  Lizard               0
7   2021-01-21              1  Lizard               1
8   2021-01-25              4  Lizard               2
9   2021-02-12             18  Lizard               0
10  2021-02-14              2  Lizard               1

Upvotes: 1

Pandas Count number of previous rows based on column value and condition

Answers (2)

Related Questions