Reputation: 2502
I have data of the following format:
station_number date river_height river_flow
0 1 2005-01-01 08:09:00 0.285233 0.782065
1 1 2005-01-01 11:28:12 0.129994 0.386652
2 4 2005-01-01 17:33:36 0.457168 0.167025
3 2 2005-01-01 23:21:00 0.359086 0.851716
4 4 2005-01-02 04:18:36 0.332998 0.830749
5 1 2005-01-02 09:28:12 0.867262 0.855507
6 3 2005-01-02 13:15:36 0.352409 0.023737
7 2 2005-01-02 17:31:12 0.696562 0.846762
8 1 2005-01-02 21:15:36 0.910944 0.096999
9 4 2005-01-03 02:13:12 0.981430 0.152109
I need to calculate a daily average of the river height and river flow per unique station number, so as a result something like this:
station_number date river_height river_flow
0 1 2005-01-01 0.285 0.782
1 1 2005-01-02 0.233 0.753
2 2 2005-01-01 0.129 0.386
3 2 2005-01-02 0.994 0.386
4 3 2005-01-01 0.457 0.167
5 3 2005-01-02 0.168 0.134
6 4 2005-01-01 0.356 0.321
7 4 2005-01-02 0.086 0.716
Keep in mind that the above numbers are random, and not actually the averages I'm looking for. I need an entry for each day for each station. I hope I have clarified what I need!
I have tried aggregating using groupby such as below:
monthly_flow_data_mean = df.groupby(pd.PeriodIndex(df['date'], freq="M"))['river_flow'].mean()
But this obviously just takes all river_flow measurements not considering the station numbers. I have had trouble finding what combination of groupby and aggregations I need to properly achieve what I need.
I tried this as well:
daily_flow_df = df.groupby(pd.PeriodIndex(df['date'], freq="D")).agg({"river_flow": "mean", "river_height": "mean", "station_number": "first"})
But I am pretty sure this also doesn't really work as we are not really using the station number to aggregate, but merely choosing how to aggregate it while aggregating all river flow measurements.
I can obviously also just split the dataframe into 4 classes and then do the aggregation per dataframe, and merge it back together. But I am wondering if there is some smart little groupby trick that can help me achieve this in less lines, as it will be useful later in my project(s) as well where I might have way more classes in the data.
Upvotes: 1
Views: 928
Reputation: 153510
You can use either of the following solutions to groupby 'station_number' and date on the 'Date' column using pd.Grouper
or dt.normalize
:
df.groupby(['station_number', pd.Grouper(key='date', freq='D')]).mean()
or
df.groupby(['station_number', df['date'].dt.normalize()]).mean()
Upvotes: 1