ojp
ojp

Reputation: 1033

Creating average daily values by grouping multiple columns

I have recently moved from R to Python and am struggling with some operations. I have data in long format and it looks like this.

date    Scaled  Name    percentage  data_days   z_value
0   2018-10-01 02:00:00 14.57   19  98.17   6265    -1.195691
1   2018-10-01 02:00:00 11.90   7   99.51   6351    -1.313125
2   2018-10-01 02:00:00 15.84   25  80.21   5119    -1.139832
3   2018-10-01 03:00:00 16.98   25  80.21   5119    -1.089692
4   2018-10-01 03:00:00 11.40   7   99.51   6351    -1.335117
... ... ... ... ... ... ...
345448  2019-06-24 00:00:00 56.16   69  92.90   5929    0.633561
345449  2019-06-24 00:00:00 40.85   70  68.71   4385    -0.039818
345450  2019-06-24 00:00:00 35.07   13  73.55   4694    -0.294040
345451  2019-06-24 00:00:00 75.54   18  40.13   2561    1.485951
345452  2019-06-24 00:00:00 40.96   90  42.59   2718    -0.034980

date          datetime64[ns]
Scaled               float64
Name                   int64
percentage           float64
data_days              int64
z_value              float64
dtype: object

I want to create two new columns, one the daily mean of 'Scaled' for each unique 'Name', and one the daily median of 'Scaled' for each unique 'Name'.

What I have implemented is this:

df_2['daily_mean_sensor'] = df_2['Scaled'].groupby([df_2['date'].dt.date]).transform('mean')
df_2['daily_median_sensor'] = df_2['Scaled'].groupby([df_2['date'].dt.date]).transform('median')

Which gives this:

date    Scaled  Name    percentage  data_days   z_value daily_mean_sensor   daily_median_sensor
0   2018-10-01 02:00:00 14.57   19  98.17   6265    -1.195691   30.570758   25.885
1   2018-10-01 02:00:00 11.90   7   99.51   6351    -1.313125   30.570758   25.885
2   2018-10-01 02:00:00 15.84   25  80.21   5119    -1.139832   30.570758   25.885
3   2018-10-01 03:00:00 16.98   25  80.21   5119    -1.089692   30.570758   25.885
4   2018-10-01 03:00:00 11.40   7   99.51   6351    -1.335117   30.570758   25.885
... ... ... ... ... ... ... ... ...
345448  2019-06-24 00:00:00 56.16   69  92.90   5929    0.633561    49.923855   48.470
345449  2019-06-24 00:00:00 40.85   70  68.71   4385    -0.039818   49.923855   48.470
345450  2019-06-24 00:00:00 35.07   13  73.55   4694    -0.294040   49.923855   48.470
345451  2019-06-24 00:00:00 75.54   18  40.13   2561    1.485951    49.923855   48.470
345452  2019-06-24 00:00:00 40.96   90  42.59   2718    -0.034980   49.923855   48.470

Clearly, these are the average daily values of 'Scaled' irrespective of the Name. I need to somehow group by day for each unique name but keep getting errors. Can someone provide a good way to solve my problem?

Upvotes: 1

Views: 96

Answers (2)

Mykola Zotko
Mykola Zotko

Reputation: 17794

You can use the method resample. To resample by month use 'M' as a rule parameter:

df.resample(rule='M', on='date')['Scaled'].transform('mean') # or 'median'

Upvotes: 0

ansev
ansev

Reputation: 30920

you also need to group by Name:

df_2['daily_mean_sensor'] = (df_2.groupby(['Name',df_2['date'].dt.date])['Scaled']
                                 .transform('mean')) 
df_2['daily_median_sensor'] = (df_2.groupby(['Name',df_2['date'].dt.date])['Scaled']
                                   .transform('median'))

or

df_2['daily_mean_sensor'] = (df_2['Scaled'].groupby([df_2['Name'],df_2['date'].dt.date])
                                 .transform('mean')) 
df_2['daily_median_sensor'] = (df_2['Scaled'].groupby([df_2['Name'],df_2['date'].dt.date])
                                   .transform('median'))

Upvotes: 4

Related Questions