Reputation: 1033
I have recently moved from R to Python and am struggling with some operations. I have data in long format and it looks like this.
date Scaled Name percentage data_days z_value
0 2018-10-01 02:00:00 14.57 19 98.17 6265 -1.195691
1 2018-10-01 02:00:00 11.90 7 99.51 6351 -1.313125
2 2018-10-01 02:00:00 15.84 25 80.21 5119 -1.139832
3 2018-10-01 03:00:00 16.98 25 80.21 5119 -1.089692
4 2018-10-01 03:00:00 11.40 7 99.51 6351 -1.335117
... ... ... ... ... ... ...
345448 2019-06-24 00:00:00 56.16 69 92.90 5929 0.633561
345449 2019-06-24 00:00:00 40.85 70 68.71 4385 -0.039818
345450 2019-06-24 00:00:00 35.07 13 73.55 4694 -0.294040
345451 2019-06-24 00:00:00 75.54 18 40.13 2561 1.485951
345452 2019-06-24 00:00:00 40.96 90 42.59 2718 -0.034980
date datetime64[ns]
Scaled float64
Name int64
percentage float64
data_days int64
z_value float64
dtype: object
I want to create two new columns, one the daily mean of 'Scaled' for each unique 'Name', and one the daily median of 'Scaled' for each unique 'Name'.
What I have implemented is this:
df_2['daily_mean_sensor'] = df_2['Scaled'].groupby([df_2['date'].dt.date]).transform('mean')
df_2['daily_median_sensor'] = df_2['Scaled'].groupby([df_2['date'].dt.date]).transform('median')
Which gives this:
date Scaled Name percentage data_days z_value daily_mean_sensor daily_median_sensor
0 2018-10-01 02:00:00 14.57 19 98.17 6265 -1.195691 30.570758 25.885
1 2018-10-01 02:00:00 11.90 7 99.51 6351 -1.313125 30.570758 25.885
2 2018-10-01 02:00:00 15.84 25 80.21 5119 -1.139832 30.570758 25.885
3 2018-10-01 03:00:00 16.98 25 80.21 5119 -1.089692 30.570758 25.885
4 2018-10-01 03:00:00 11.40 7 99.51 6351 -1.335117 30.570758 25.885
... ... ... ... ... ... ... ... ...
345448 2019-06-24 00:00:00 56.16 69 92.90 5929 0.633561 49.923855 48.470
345449 2019-06-24 00:00:00 40.85 70 68.71 4385 -0.039818 49.923855 48.470
345450 2019-06-24 00:00:00 35.07 13 73.55 4694 -0.294040 49.923855 48.470
345451 2019-06-24 00:00:00 75.54 18 40.13 2561 1.485951 49.923855 48.470
345452 2019-06-24 00:00:00 40.96 90 42.59 2718 -0.034980 49.923855 48.470
Clearly, these are the average daily values of 'Scaled' irrespective of the Name. I need to somehow group by day for each unique name but keep getting errors. Can someone provide a good way to solve my problem?
Upvotes: 1
Views: 96
Reputation: 17794
You can use the method resample
. To resample by month use 'M'
as a rule
parameter:
df.resample(rule='M', on='date')['Scaled'].transform('mean') # or 'median'
Upvotes: 0
Reputation: 30920
you also need to group by Name:
df_2['daily_mean_sensor'] = (df_2.groupby(['Name',df_2['date'].dt.date])['Scaled']
.transform('mean'))
df_2['daily_median_sensor'] = (df_2.groupby(['Name',df_2['date'].dt.date])['Scaled']
.transform('median'))
or
df_2['daily_mean_sensor'] = (df_2['Scaled'].groupby([df_2['Name'],df_2['date'].dt.date])
.transform('mean'))
df_2['daily_median_sensor'] = (df_2['Scaled'].groupby([df_2['Name'],df_2['date'].dt.date])
.transform('median'))
Upvotes: 4