Reputation: 339
I have run a very simple aggregation by quarter on Pandas and tested the results just out of curiosity.
dfQtr = df.groupby([pd.TimeGrouper(key= 'Date', freq='Q'),'JourneyType','OriginCode','DestinationCode']).agg(np.sum).reset_index()
print sum(dfQtr.TotalFlights) , sum(df.TotalFlights)
941899 967205
@IanS My apologies, here is a subset of the fairly big data set
Date JourneyType OriginCode DestinationCode Total_Flights
01/08/2015 T_A-M-R-A-S_M_R_M_S D_P FLL SDQ 1
01/08/2015 T_A-M-R-A-S_M_R_M_S D_P PAP FLL 1
01/08/2015 T_A-M-R-A-S_M_R_M_S D_P TPA BDL 1
01/08/2015 T_A-M-R-A-S_M_R_M_S D_P HPN MCO 1
01/08/2015 T_A-L-O-C-G_L_P_D_S D_P FLL PAP 1
01/08/2015 T_A-L-O-C-G_L_P_D_S D_P FLL PAP 1
01/08/2015 T_A-L-O-C-G_L_P_D_S D_P FLL PIT 1
The result shows that there are a different before & after aggregation and I wonder why that might be?
Many thanks! Will
Upvotes: 0
Views: 59
Reputation: 4779
"NA groups in GroupBy are automatically excluded"
http://pandas.pydata.org/pandas-docs/stable/missing_data.html#na-values-in-groupby
I'm guessing you have some missing values somewhere.
Upvotes: 1