Reputation: 361
I have a dataframe that is very large, in this format:
id time price
ABC 01:04 100
XYZ 01:04 50
QRS 01:04 25
ABC 01:03 100
XYZ 01:03 50
QRS 01:03 25
ABC 01:02 100
XYZ 01:02 50
QRS 01:02 25
ABC 01:01 100
XYZ 01:01 50
QRS 01:01 25
ABC 01:00 100
XYZ 01:00 50
QRS 01:00 25
In this case, 1:04pm is the most recent data, and I want a rolling mean of price for each ticker, going back 4 periods. I don't care about the rolling mean starting at 1:03pm or 1:02pm or 1:01pm.
I have this code so far which takes care of everything except it calculates means for all time periods, not just the most recent one:
rmean = db.groupby('id').rolling(window=3).mean()
The reason I need to only calculate it for the first time period is because my actual data is massive and it takes too long.
Expected output:
id time price
ABC 01:04 100
XYZ 01:04 50
QRS 01:04 25
Upvotes: 0
Views: 807
Reputation: 2128
You can use .groupby.agg
, to compute the rolling mean for only the recent data you can take head(3)
and compute mean of it.
Use:
new_df = (df.sort_values(by=['time'], ascending = False)
.groupby('id', as_index = False)
.agg(
time = ('time', 'first'),
price = ('price', lambda x: x.head(3).values.mean())
))
Prints:
>>> new_df
id time price
0 ABC 01:04 100
1 QRS 01:04 25
2 XYZ 01:04 50
Upvotes: 2