Reputation: 647
In Pandas I am trying create a column that counts all previous rows based on two column values.
The idea is to sort the data by animal and by date and then use pandas shift function to calculate the variance between an animals current date and previous date(the sample data below already has this done).
Then I want to create a new column called Previous_Count
that is a rolling count of previous records based on value in Animal
column and resets if the Date_Variance
field is larger than 10
.
Sample Data Code:
import pandas as pd
data = {'Date': ['2021-01-01','2021-01-05','2021-01-10','2021-01-11','2021-01-04','2021-01-08','2021-01-20','2021-01-21','2021-01-25','2021-02-12','2021-02-14'],
'Date_Variance': [0,4,5,1,0,4,0,1,4,18,2],
'Animal': ['Dog','Dog','Dog','Dog','Cat','Cat','Lizard','Lizard','Lizard','Lizard','Lizard'],
#'Previous_count': [0,1,2,3,0,1,0,1,2,0,1]
}
df = pd.DataFrame(data)
Sample Data Output with Previous_Count
column:
Date Date_Variance Animal Previous_count
1/1/2021 0 Dog 0
1/5/2021 4 Dog 1
1/10/2021 5 Dog 2
1/11/2021 1 Dog 3
1/4/2021 0 Cat 0
1/8/2021 4 Cat 1
1/20/2021 0 Lizard 0
1/21/2021 1 Lizard 1
1/25/2021 4 Lizard 2
2/12/2021 18 Lizard 0
2/14/2021 2 Lizard 1
Upvotes: 1
Views: 1031
Reputation: 30920
Use GroupBy.cumcount
. Here we use Series.cumsum
to get a new group whenever Date_Variance
is greater than 10
df['Previous_count'] = df.groupby(['Animal',
df['Date_Variance'].gt(10).cumsum()]).cumcount()
print(df)
Date Date_Variance Animal Previous_count
0 2021-01-01 0 Dog 0
1 2021-01-05 4 Dog 1
2 2021-01-10 5 Dog 2
3 2021-01-11 1 Dog 3
4 2021-01-04 0 Cat 0
5 2021-01-08 4 Cat 1
6 2021-01-20 0 Lizard 0
7 2021-01-21 1 Lizard 1
8 2021-01-25 4 Lizard 2
9 2021-02-12 18 Lizard 0
10 2021-02-14 2 Lizard 1
print(df['Date_Variance'].gt(10).cumsum())
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 1
Name: Date_Variance, dtype: int64
Upvotes: 1
Reputation: 23146
Try:
grouper = df.groupby("Animal")["Date_Variance"].apply(lambda x: x.gt(10).cumsum())
df["Previous_count"] = df.groupby(["Animal", grouper]).cumcount()
>>> df
Date Date_Variance Animal Previous_count
0 2021-01-01 0 Dog 0
1 2021-01-05 4 Dog 1
2 2021-01-10 5 Dog 2
3 2021-01-11 1 Dog 3
4 2021-01-04 0 Cat 0
5 2021-01-08 4 Cat 1
6 2021-01-20 0 Lizard 0
7 2021-01-21 1 Lizard 1
8 2021-01-25 4 Lizard 2
9 2021-02-12 18 Lizard 0
10 2021-02-14 2 Lizard 1
Upvotes: 1