Reputation: 606
I have a dataframe with the following structure:
Index: DatetimeIndex Columns: Client, Business, Balances
My data goes across multiple years. I am trying to get the rolling mean of Balances by year. Meaning I want a variable length window which on Jan 1st of the year will be a window of size 1 and on Dec 31st will be window of size 365. Then the following year it starts over at 1 and builds from there. I am also grouping by the Client and Business.
I have tried the following:
df.groupby(["Client", "Business"]).rolling("AS")["Balances"].mean()
I also tried rolling with "A"
.
I am getting the error:
ValueError: <YearBegin: month=1> is a non-fixed frequency
From the documentation I thought rolling would accept and offset for variable length window but they year ones im trying arent working.
Whats the right way to do this?
Upvotes: 1
Views: 470
Reputation: 1883
The error is likely because the number of days in a year is not fixed. Alternative could be to use 365d
instead as in code below using .expanding()
window and pd.Grouper()
here. Assuming the expected output looks as below as it is not posted in the question.
# Import libraries
import pandas as pd
# Create DataFrame
dt = [
'2020-05-29',
'2020-05-29',
'2020-01-29',
'2020-01-29',
'2020-02-29',
'2020-03-29',
'2020-06-29',
'2020-06-29'
]
df = pd.DataFrame({
'dt':times
})
df['dt'] = pd.to_datetime(df['dt'])
df['Balances'] = np.random.randint(0,100,8)
df['Client'] = ['A','B','C','A']*2
df['Business'] = ['x','y','z','x']*2
# Groupby
df.groupby([pd.Grouper(key='dt', freq='365d'), "Client", "Business"])['Balances'].expanding().mean()
Output
dfg
dt Client Business
2020-06-29 A x 0 44.0
3 51.0
4 63.0
7 36.0
B y 1 54.0
5 13.0
C z 2 43.0
6 48.0
Upvotes: 1