Reputation: 319
I have a sample dataframe
Account Date Amount
10 2020-06-01 100
10 2020-06-11 500
10 2020-06-21 600
10 2020-06-25 900
10 2020-07-11 1000
10 2020-07-15 600
11 2020-06-01 100
11 2020-06-11 200
11 2020-06-21 500
11 2020-06-25 1500
11 2020-07-11 2500
11 2020-07-15 6700
I want to get the number of rows in each 30 day interval for each account ie
Account Date Amount
10 2020-06-01 1
10 2020-06-11 2
10 2020-06-21 3
10 2020-06-25 4
10 2020-07-11 4
10 2020-07-15 4
11 2020-06-01 1
11 2020-06-11 2
11 2020-06-21 3
11 2020-06-25 4
11 2020-07-11 4
11 2020-07-15 4
I have tried Grouper and resampling but those give me the counts per each 30 days and not the rolling counts.
Thanks in advance!
Upvotes: 6
Views: 1153
Reputation: 131
df = df.resample('30D').agg({'date':'count','Amount':'sum'})
This will aggregate the 'Date' column by count, getting the data you want.
However, since you will need to first set date as your index for resampling, you could create a "dummy" column containing zeros:
df['dummy'] = pd.Series(np.zeros(len(df))
Upvotes: 2
Reputation: 59579
You can use broadcasting within group to check how many rows fall within X days.
import pandas as pd
def within_days(s, days):
arr = ((s.to_numpy() >= s.to_numpy()[:, None])
& (s.to_numpy() <= (s + pd.offsets.DateOffset(days=days)).to_numpy()[:, None])).sum(axis=0)
return pd.Series(arr, index=s.index)
df['Amount'] = df.groupby('Account')['Date'].apply(within_days, days=30)
Account Date Amount
0 10 2020-06-01 1
1 10 2020-06-11 2
2 10 2020-06-21 3
3 10 2020-06-25 4
4 10 2020-07-11 4
5 10 2020-07-15 4
6 11 2020-06-01 1
7 11 2020-06-11 2
8 11 2020-06-21 3
9 11 2020-06-25 4
10 11 2020-07-11 4
11 11 2020-07-15 4
Upvotes: 2
Reputation: 195573
def get_rolling_amount(grp, freq):
return grp.rolling(freq, on="Date", closed="both").count()
df["Date"] = pd.to_datetime(df["Date"])
df["Amount"] = df.groupby("Account").apply(get_rolling_amount, "30D").values
print(df)
Prints:
Account Date Amount
0 10 2020-06-01 1
1 10 2020-06-11 2
2 10 2020-06-21 3
3 10 2020-06-25 4
4 10 2020-07-11 4
5 10 2020-07-15 4
6 11 2020-06-01 1
7 11 2020-06-11 2
8 11 2020-06-21 3
9 11 2020-06-25 4
10 11 2020-07-11 4
11 11 2020-07-15 4
Upvotes: 5