Reputation: 77
Sample Data
id lat lon datetime values
0 A1 32.553644 -117.061603 2007-05-11 44.90
1 A1 32.553644 -117.061603 2007-05-12 44.92
2 A2 32.553644 -117.061603 2007-05-13 44.92
3 A2 32.553644 -117.061603 2007-05-14 44.91
4 A3 32.553644 -117.061603 2007-05-15 44.92
I am trying to groupby id, month, year to get the monthly mean of values column.
The following code produces my DF with monthly mean values. The problem is I would like the result to include the month and year the mean consists of.
group_cols = ['id', 'lat', 'lon', df['datetime'].dt.month, df['datetime'].dt.year]
dfmean = df.groupby(group_cols, as_index=False)['values'].mean()
This produces the result:
id lat lon mean
0 A1 32.553644 -117.061603 0.373333
1 A2 32.553644 -117.061603 0.460000
2 A3 32.553644 -117.061603 0.485484
Note: The result above is on entire dataset of 1m plus records. So it does not match what the sample provided will produce.
My desired result would look something like this:
id lat lon date mean
0 A1 32.553644 -117.061603 05-2007 0.373333
1 A2 32.553644 -117.061603 05-2007 0.460000
2 A3 32.553644 -117.061603 05-2007 0.485484
order of columns is irrelevant. I just need it as a dataframe so i can select columns by df[column_name]
Also note: Month/Year do not have to be one column. It can also be a column with month and a column with year. It is just important that the date be included for each mean value.
Upvotes: 2
Views: 76
Reputation: 323226
IIUC
df['date']=df.datetime.dt.strftime('%Y-%m')
group_cols = ['id', 'lat', 'lon', 'date']
dfmean = df.groupby(group_cols, as_index=False)['values'].mean()
dfmean
Out[399]:
id lat lon date values
0 A1 32.553644 -117.061603 2007-05 44.910
1 A2 32.553644 -117.061603 2007-05 44.915
2 A3 32.553644 -117.061603 2007-05 44.920
Upvotes: 4