Pandas Groupby column in result

Question

Sample Data

   id        lat         lon   datetime  values
0  A1  32.553644 -117.061603 2007-05-11  44.90
1  A1  32.553644 -117.061603 2007-05-12  44.92
2  A2  32.553644 -117.061603 2007-05-13  44.92
3  A2  32.553644 -117.061603 2007-05-14  44.91
4  A3  32.553644 -117.061603 2007-05-15  44.92

I am trying to groupby id, month, year to get the monthly mean of values column.

The following code produces my DF with monthly mean values. The problem is I would like the result to include the month and year the mean consists of.

group_cols = ['id', 'lat', 'lon', df['datetime'].dt.month, df['datetime'].dt.year]
dfmean = df.groupby(group_cols, as_index=False)['values'].mean()

This produces the result:

   id       lat         lon      mean
0  A1  32.553644 -117.061603     0.373333
1  A2  32.553644 -117.061603     0.460000
2  A3  32.553644 -117.061603     0.485484

Note: The result above is on entire dataset of 1m plus records. So it does not match what the sample provided will produce.

My desired result would look something like this:

   id       lat         lon       date    mean
0  A1  32.553644 -117.061603     05-2007  0.373333
1  A2  32.553644 -117.061603     05-2007  0.460000
2  A3  32.553644 -117.061603     05-2007  0.485484

order of columns is irrelevant. I just need it as a dataframe so i can select columns by df[column_name]

Also note: Month/Year do not have to be one column. It can also be a column with month and a column with year. It is just important that the date be included for each mean value.

BENY · Accepted Answer

IIUC

df['date']=df.datetime.dt.strftime('%Y-%m')
group_cols = ['id', 'lat', 'lon', 'date']
dfmean = df.groupby(group_cols, as_index=False)['values'].mean()
dfmean
Out[399]: 
   id        lat         lon     date  values
0  A1  32.553644 -117.061603  2007-05  44.910
1  A2  32.553644 -117.061603  2007-05  44.915
2  A3  32.553644 -117.061603  2007-05  44.920

Pandas Groupby column in result

Answers (1)

Related Questions