Reputation: 1879
I am using the code below in an attempt to group by a column value, then run a cumulative sum and moving average on ordered data for that group only:
#this works OK
df['csum'] = df.sort_values(['name','day_time','delta_minutes'],ascending=True).groupby(['name']) ['value'].cumsum()
#throws error
df['rolling'] = df.sort_values(['name','day_time','delta_minutes'],ascending=True).groupby(['name'])['value'].rolling(window=2).mean()
Original dataframe:
name value delta_minutes day_time
0 MAC000039 0.069 0 2012-10-13
1 MAC000039 0.054 30 2012-10-13
2 MAC000039 0.085 60 2012-10-13
3 MAC000040 0.082 0 2012-10-12
4 MAC000040 0.053 30 2012-10-12
5 MAC000040 0.075 60 2012-10-12
6 MAC000040 0.195 90 2012-10-12
7 MAC000039 0.098 0 2012-10-12
8 MAC000039 0.055 30 2012-10-12
9 MAC000039 0.054 60 2012-10-12
10 MAC000039 0.099 90 2012-10-12
Expected output
name value delta_minutes day_time csum rolling
0 MAC000039 0.069 0 2012-10-13 0.375 ..
1 MAC000039 0.054 30 2012-10-13 0.429 ..
2 MAC000039 0.085 60 2012-10-13 0.514 ..
I get the following error when attempting the rolling mean
TypeError: incompatible index of inserted column with frame index
Any idea how to do this?
Upvotes: 1
Views: 1353
Reputation: 3275
The error occurs because df.sort_values(['name','day_time','delta_minutes'],ascending=True).groupby(['name'])['value'].rolling(window=2).mean()
has a MultiIndex
. To fix it, reset the 'name'
index and drop it.
import pandas as pd
"""
name value delta_minutes day_time
0 MAC000039 0.069 0 2012-10-13
1 MAC000039 0.054 30 2012-10-13
2 MAC000039 0.085 60 2012-10-13
3 MAC000040 0.082 0 2012-10-12
4 MAC000040 0.053 30 2012-10-12
5 MAC000040 0.075 60 2012-10-12
6 MAC000040 0.195 90 2012-10-12
7 MAC000039 0.098 0 2012-10-12
8 MAC000039 0.055 30 2012-10-12
9 MAC000039 0.054 60 2012-10-12
10 MAC000039 0.099 90 2012-10-12
"""
df = pd.read_clipboard()
# sorting before adding columns
df.sort_values([
'name',
'day_time',
'delta_minutes'
], inplace = True)
# cumulative sum grouped on name
df['csum'] = df.groupby('name').value.cumsum()
# reset index `name` and drop it
df['rolling'] = df.groupby('name').value.rolling(2).mean().reset_index(level = 'name', drop = True)
print(df)
name value delta_minutes day_time csum rolling
7 MAC000039 0.098 0 2012-10-12 0.098 NaN
8 MAC000039 0.055 30 2012-10-12 0.153 0.0765
9 MAC000039 0.054 60 2012-10-12 0.207 0.0545
10 MAC000039 0.099 90 2012-10-12 0.306 0.0765
0 MAC000039 0.069 0 2012-10-13 0.375 0.0840
1 MAC000039 0.054 30 2012-10-13 0.429 0.0615
2 MAC000039 0.085 60 2012-10-13 0.514 0.0695
3 MAC000040 0.082 0 2012-10-12 0.082 NaN
4 MAC000040 0.053 30 2012-10-12 0.135 0.0675
5 MAC000040 0.075 60 2012-10-12 0.210 0.0640
6 MAC000040 0.195 90 2012-10-12 0.405 0.1350
Upvotes: 2