Reputation: 59
I am working with the following df:
date/time wind (°) wind (kt) temp (C°) humidity(%) currents (°) currents (kt) stemp (C°) stemp_diff
0 2017-04-27 11:21:54 180.0 14.0 27.0 5000.0 200.0 0.4 25.4 2.6
1 2017-05-04 20:31:12 150.0 15.0 22.0 5000.0 30.0 0.2 26.5 -1.2
2 2017-05-08 05:00:52 110.0 6.0 25.0 5000.0 30.0 0.2 27.0 -1.7
3 2017-05-09 05:00:55 160.0 13.0 23.0 5000.0 30.0 0.6 27.0 -2.0
4 2017-05-10 16:39:16 160.0 20.0 22.0 5000.0 30.0 0.6 26.5 -1.8
5 ... ... ... ... ... ... ... ... ...
6 2020-10-25 00:00:00 5000.0 5000.0 21.0 81.0 5000.0 5000.0 23.0 -2.0
7 2020-10-26 00:00:00 5000.0 5000.0 21.0 77.0 5000.0 5000.0 23.0 -2.0
8 2020-10-27 00:00:00 5000.0 5000.0 21.0 80.0 5000.0 5000.0 23.0 -2.0
9 2020-10-31 00:00:00 5000.0 5000.0 22.0 79.0 5000.0 5000.0 23.0 -2.0
10 2020-11-01 00:00:00 5000.0 5000.0 19.0 82.0 5000.0 5000.0 23.0 -2.0
I would like to find a way to iterate through the years and months my date/time
column (i.e. April 2017, all the way to November 2020), and for each unique time period, calculate the average difference for the stemp
column as a starting point (so the average for April 2017, May 2017, and so on, or even just use the months).
I'm trying to use this sort of code, but not quite sure how to translate it into something that works:
for '%MM' in australia_in['date/time']:
australia_in['stemp_diff'] = australia_in['stemp (C°)'] - australia_out['stemp (C°)']
australia_in
Any thoughts?
Upvotes: 1
Views: 167
Reputation: 262484
Let's first make sure that your date/time
columns is of datetime
type, and set it to index:
df['date/time'] = pd.to_datetime(df['date/time'])
df = df.set_index('date/time')
Then you have several options.
Using pandas.Grouper
:
>>> df.groupby(pd.Grouper(freq='M'))['stemp (C°)'].mean().dropna()
date/time
2017-04-30 25.40
2017-05-31 26.75
2020-10-31 23.00
2020-11-30 23.00
Name: stemp (C°), dtype: float64
Using the year and month:
>>> df.groupby([df.index.year, df.index.month])['stemp (C°)'].mean()
date/time date/time
2017 4 25.40
5 26.75
2020 10 23.00
11 23.00
Name: stemp (C°), dtype: float64
If you do not want to have the date/time as index, you can use the column instead:
df.groupby([df['date/time'].dt.year, df['date/time'].dt.month])['stemp (C°)'].mean()
And finally, to have nice names for the year and month:
(df.groupby([df['date/time'].dt.year.rename('year'),
df['date/time'].dt.month.rename('month')])
['stemp (C°)'].mean()
)
output:
year month
2017 4 25.40
5 26.75
2020 10 23.00
11 23.00
Name: stemp (C°), dtype: float64
Applying the same calculation on stemp_diff
:
year month
2017 4 2.600
5 -1.675
2020 10 -2.000
11 -2.000
Name: stemp_diff, dtype: float64
Upvotes: 1