Reputation: 121
I have the following dataframe:
-----+-----+-------------+-------------+-------------------------+
| ID1 | ID2 | Box1_weight | Box2_weight | Average Prev Weight ID1 |
+-----+-----+-------------+-------------+-------------------------+
| 19 | 677 | 3 | 2 | - |
+-----+-----+-------------+-------------+-------------------------+
| 677 | 19 | 1 | 0 | 2 |
+-----+-----+-------------+-------------+-------------------------+
| 19 | 677 | 3 | 1 | (0 + 3 )/2=1.5 |
+-----+-----+-------------+-------------+-------------------------+
| 19 | 677 | 7 | 0 | (3+0+3)/3=2 |
+-----+-----+-------------+-------------+-------------------------+
| 677 | 19 | 1 | 3 | (0+1+1)/3=0.6 |
I want to work out the moving average of weight the past 3 boxes, based on ID. I want to do this for all IDs in ID1.
I have put the column I want to calculate, along with the calculations is in the table above, labelled "Average Prev Weight ID1"
I can get a a rolling average for each individual column using the following:
df_copy.groupby('ID1')['Box1_weight'].apply(lambda x: x.shift().rolling(period_length, min_periods=1).mean())
However, this does not take into account that the item may also have been packed in the column labelled "Box2_weight"
How can I get a rolling average that is per ID, across the two columns?
Any guidance is appreciated.
Upvotes: 0
Views: 1681
Reputation: 5451
Here is my attempt
stack the 2 ids and 2 weights columns to create dataframe with 1 ids and 1 weight column. Calculate the running average and assign back the running average for ID1 back to the dataframe
I have used your code of calculating rolling average but I arranged data to df2 before doing ti
import pandas as pd
d = {
"ID1": [19,677,19,19,677],
"ID2": [677, 19, 677,677, 19],
"Box1_weight": [3,1,3,7,1],
"Box2_weight": [2,0,1,0,3]
}
df = pd.DataFrame(d)
display(df)
period_length=3
ids = df[["ID1", "ID2"]].stack().values
weights = df[["Box1_weight", "Box2_weight"]].stack().values
df2=pd.DataFrame(dict(ids=ids, weights=weights))
rolling_avg = df2.groupby("ids")["weights"] \
.apply(lambda x: x.shift().rolling(period_length, min_periods=1)
.mean()).values.reshape(-1,2)
df["rolling_avg"] = rolling_avg[:,0]
display(df)
Result
ID1 ID2 Box1_weight Box2_weight
0 19 677 3 2
1 677 19 1 0
2 19 677 3 1
3 19 677 7 0
4 677 19 1 3
ID1 ID2 Box1_weight Box2_weight rolling_avg
0 19 677 3 2 NaN
1 677 19 1 0 2.000000
2 19 677 3 1 1.500000
3 19 677 7 0 2.000000
4 677 19 1 3 0.666667
Upvotes: 1
Reputation: 59274
Not sure if this is what you want. I had trouble understanding your requirements. But here's a go:
ids = ['ID1', 'ID2']
ind = np.argsort(df[ids].to_numpy(), 1)
make_sort = lambda s, ind: np.take_along_axis(s, ind, axis=1)
f = make_sort(df[ids].to_numpy(), ind)
s = make_sort(df[['Box1_weight', 'Box2_weight']].to_numpy(), ind)
df2 = pd.DataFrame(np.concatenate([f,s], 1), columns=df.columns)
res1 = df2.groupby('ID1').Box1_weight.rolling(3, min_periods=1).mean().shift()
res2 = df2.groupby('ID2').Box2_weight.rolling(3, min_periods=1).mean().shift()
means = pd.concat([res1,res2], 1).rename(columns={'Box1_weight': 'w1', 'Box2_weight': 'w2'})
x = df.set_index([df.ID1.values, df.index])
final = x[ids].merge(means, left_index=True, right_index=True)[['w1','w2']].sum(1).sort_index(level=1)
df['final_weight'] = final.tolist()
ID1 ID2 Box1_weight Box2_weight final_weight
0 19 677 3 2 0.000000
1 677 19 1 0 2.000000
2 19 677 3 1 1.500000
3 19 677 7 0 2.000000
4 677 19 1 3 0.666667
Upvotes: 1