Iterate through dates to calculate difference

Question

I am working with the following df:

              date/time wind (°) wind (kt) temp (C°) humidity(%) currents (°) currents (kt) stemp (C°) stemp_diff
0   2017-04-27 11:21:54    180.0      14.0      27.0      5000.0        200.0           0.4       25.4        2.6
1   2017-05-04 20:31:12    150.0      15.0      22.0      5000.0         30.0           0.2       26.5       -1.2
2   2017-05-08 05:00:52    110.0       6.0      25.0      5000.0         30.0           0.2       27.0       -1.7
3   2017-05-09 05:00:55    160.0      13.0      23.0      5000.0         30.0           0.6       27.0       -2.0
4   2017-05-10 16:39:16    160.0      20.0      22.0      5000.0         30.0           0.6       26.5       -1.8
5                   ...      ...       ...       ...         ...          ...           ...        ...        ...
6   2020-10-25 00:00:00   5000.0    5000.0      21.0        81.0       5000.0        5000.0       23.0       -2.0
7   2020-10-26 00:00:00   5000.0    5000.0      21.0        77.0       5000.0        5000.0       23.0       -2.0
8   2020-10-27 00:00:00   5000.0    5000.0      21.0        80.0       5000.0        5000.0       23.0       -2.0
9   2020-10-31 00:00:00   5000.0    5000.0      22.0        79.0       5000.0        5000.0       23.0       -2.0
10  2020-11-01 00:00:00   5000.0    5000.0      19.0        82.0       5000.0        5000.0       23.0       -2.0

I would like to find a way to iterate through the years and months my date/time column (i.e. April 2017, all the way to November 2020), and for each unique time period, calculate the average difference for the stemp column as a starting point (so the average for April 2017, May 2017, and so on, or even just use the months).

I'm trying to use this sort of code, but not quite sure how to translate it into something that works:

for '%MM' in australia_in['date/time']:
    australia_in['stemp_diff'] = australia_in['stemp (C°)'] - australia_out['stemp (C°)'] 
    australia_in

Any thoughts?

mozway · Accepted Answer

Let's first make sure that your date/time columns is of datetime type, and set it to index:

df['date/time'] = pd.to_datetime(df['date/time'])
df = df.set_index('date/time')

Then you have several options.

Using pandas.Grouper:

>>> df.groupby(pd.Grouper(freq='M'))['stemp (C°)'].mean().dropna()
date/time
2017-04-30    25.40
2017-05-31    26.75
2020-10-31    23.00
2020-11-30    23.00
Name: stemp (C°), dtype: float64

Using the year and month:

>>> df.groupby([df.index.year, df.index.month])['stemp (C°)'].mean()
date/time  date/time
2017       4            25.40
           5            26.75
2020       10           23.00
           11           23.00
Name: stemp (C°), dtype: float64

If you do not want to have the date/time as index, you can use the column instead:

df.groupby([df['date/time'].dt.year, df['date/time'].dt.month])['stemp (C°)'].mean()

And finally, to have nice names for the year and month:

(df.groupby([df['date/time'].dt.year.rename('year'),
             df['date/time'].dt.month.rename('month')])
   ['stemp (C°)'].mean()
)


output:
year  month
2017  4        25.40
      5        26.75
2020  10       23.00
      11       23.00
Name: stemp (C°), dtype: float64

Applying the same calculation on stemp_diff:

year  month
2017  4        2.600
      5       -1.675
2020  10      -2.000
      11      -2.000
Name: stemp_diff, dtype: float64

Iterate through dates to calculate difference

Answers (1)

Related Questions