Reputation: 501
Pandas has some very strange group by behavior with Multi_indexing. I don't understand why one of these sites is not being calculated but when called alone works perfectly fine.
Consider the following code
all_data['rolling_avg_supply_temp'] = all_data.groupby(level=0).rolling('1h',on='ts')['supplytemp'].mean()
all_data['rolling_avg_supply_temp'].groupby(level=0).describe()
site | count | mean | std | min | 25% | 50% | 75% | max |
---|---|---|---|---|---|---|---|---|
1A1 | 22130.0 | 21.698209 | 0.588990 | 14.611525 | 21.521668 | 21.837078 | 22.001139 | 23.345806 |
2B5 | 22533.0 | 21.952604 | 0.535339 | 18.900000 | 21.828639 | 21.976729 | 22.110053 | 25.985087 |
3B8 | 9515.0 | 22.060124 | 0.427317 | 19.500000 | 21.822788 | 22.004277 | 22.226354 | 24.400361 |
3B9 | 19234.0 | 21.810098 | 0.686575 | 19.530944 | 21.657984 | 21.966606 | 22.142692 | 23.987056 |
VC1A1 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
The VC1A site just doesn't get a rolling average applied in the function call.
However consider this function call ON THE SAME dataset
all_data.loc["VC1A1"].rolling('1h',on='ts')['supplytemp'].mean().describe()
Statistic | Value |
---|---|
count | 14400.000000 |
mean | 21.492677 |
std | 0.406032 |
min | 20.152830 |
25% | 21.345195 |
50% | 21.430232 |
75% | 21.755384 |
max | 22.265454 |
There is no mixing of datatypes for reference here is the info on the dataset
all_data.info()
Metadata | Value |
---|---|
Total Entries | 5,456,561 |
Memory Usage | 654.1+ MB |
MultiIndex From | ('1A1', Timestamp('2024-04-07 17:00:00.431000+0000', tz='UTC')) |
MultiIndex To | ('VC1A1', Timestamp('2024-04-17 09:29:28.405000+0000', tz='UTC')) |
Total Columns | 11 |
Column Index | Column Name | Dtype |
---|---|---|
0 | sn | object |
1 | ts | datetime64[ns, UTC] |
2 | maxcellt | float64 |
3 | mincellt | float64 |
4 | supplytemp | float64 |
5 | hvac_status_c1runstat | float64 |
6 | hvac_status_c2runstat | float64 |
7 | hvac_status_bypassdamp | float64 |
8 | hvac_sensor_heatcurrent | float64 |
9 | hvac_sensor_avgspacetemp | float64 |
10 | timediff | timedelta64[ns] |
Upvotes: 0
Views: 27