ZdWhite
ZdWhite

Reputation: 501

Pandas MultiIndex Rolling Group by

Pandas has some very strange group by behavior with Multi_indexing. I don't understand why one of these sites is not being calculated but when called alone works perfectly fine.

Consider the following code

all_data['rolling_avg_supply_temp'] = all_data.groupby(level=0).rolling('1h',on='ts')['supplytemp'].mean()
all_data['rolling_avg_supply_temp'].groupby(level=0).describe()
site count mean std min 25% 50% 75% max
1A1 22130.0 21.698209 0.588990 14.611525 21.521668 21.837078 22.001139 23.345806
2B5 22533.0 21.952604 0.535339 18.900000 21.828639 21.976729 22.110053 25.985087
3B8 9515.0 22.060124 0.427317 19.500000 21.822788 22.004277 22.226354 24.400361
3B9 19234.0 21.810098 0.686575 19.530944 21.657984 21.966606 22.142692 23.987056
VC1A1 0.0 NaN NaN NaN NaN NaN NaN NaN

The VC1A site just doesn't get a rolling average applied in the function call.

However consider this function call ON THE SAME dataset

all_data.loc["VC1A1"].rolling('1h',on='ts')['supplytemp'].mean().describe()
Statistic Value
count 14400.000000
mean 21.492677
std 0.406032
min 20.152830
25% 21.345195
50% 21.430232
75% 21.755384
max 22.265454

There is no mixing of datatypes for reference here is the info on the dataset

all_data.info()
Metadata Value
Total Entries 5,456,561
Memory Usage 654.1+ MB
MultiIndex From ('1A1', Timestamp('2024-04-07 17:00:00.431000+0000', tz='UTC'))
MultiIndex To ('VC1A1', Timestamp('2024-04-17 09:29:28.405000+0000', tz='UTC'))
Total Columns 11
Column Index Column Name Dtype
0 sn object
1 ts datetime64[ns, UTC]
2 maxcellt float64
3 mincellt float64
4 supplytemp float64
5 hvac_status_c1runstat float64
6 hvac_status_c2runstat float64
7 hvac_status_bypassdamp float64
8 hvac_sensor_heatcurrent float64
9 hvac_sensor_avgspacetemp float64
10 timediff timedelta64[ns]

Upvotes: 0

Views: 27

Answers (0)

Related Questions