Reputation: 147
I have a table like this:
date ID Value
1 aa 5.5
1 aa 5.5
1 bb 66
1 bb 66
2 cc 2.03
2 aa 0.1
2 aa 0.1
3 bb 7
4 dd 7
5 aa 4
5 aa 4
Some information about rows:
Date - Same date can appear in more than one row
ID - Same ID can appear in more than one row
If ID and Date are same, then value will also be same.
I want to calculate rolling().mean()
of Value
column. But I want to groupby date
and ID
, and calculate the rolling mean that, it does not take the mean of row with same date.
So this do not work, because in this, it takes rolling mean of same date twice.
df.groupby(["ID","date"])["Value"].rolling(3).mean()
I have implemented a for-loop solution but it is way slower, and I am working on millions of rows. This is my current solution.
uniqueID = df["ID"]
for idname in uniqueID:
temp = df.loc[df["ID"] == idname].drop_duplicates(inplace=False,keep="first",subset="date").set_index('date', inplace=False)["Value"].rolling(3).mean()
df.loc[df["ID"] == idname,"rollingmeanCol"] = pd.merge(df.loc[df["ID"]==idname,["date"]],temp,on=["date"],how="left")["Value"].values
Any faster solution without loops?
In the loop also, I am doing this query 3 times, any way to do this query 1 time only? df.loc[df["ID"] == idname]
Expected output (can be verified using the above loop code)
date ID Value rollingmeanCol
1 aa 5.5 NaN
1 aa 5.5 NaN
1 bb 66.0 NaN
1 bb 66.0 NaN
2 cc 2.03 NaN
2 aa 0.1 NaN
2 aa 0.1 NaN
3 bb 7.0 NaN
4 dd 7.0 NaN
5 aa 4.0 3.1999999999999997
5 aa 4.0 3.1999999999999997
Upvotes: 0
Views: 162
Reputation: 1420
You can try this:
# Calculate rolling mean
rollingmeanCol = (
df
.drop_duplicates()
.sort_values('date')
.set_index('date')
.groupby("ID")["Value"].rolling(3).mean()
.rename('rollingmeanCol')
)
# Merge it with your df
df = df.merge(rollingmeanCol, on=['date','ID'])
Output is the same as your expected.
Upvotes: 1