Pratik Panchal
Pratik Panchal

Reputation: 53

sql query to avg over a fixed range over the entire data set

I have a sql table like this:

date temperature
12-22-2022 23
12-21-2022 26
12-20-2022 27
12-19-2022 25
12-18-2022 26
12-17-2022 27
12-16-2022 29
12-15-2022 28
12-14-2022 29
12-13-2022 30
12-12-2022 27
12-11-2022 29
12-10-2022 29
12-09-2022 28
12-08-2022 31
12-07-2022 32
12-06-2022 33
12-05-2022 34
12-04-2022 33
12-03-2022 32
12-02-2022 29
12-01-2022 33
11-30-2022 31
11-29-2022 33
11-28-2022 32

I want to write a query that would give me an average of 7 day chunks ending in the latest day.

week_ending_in_date avg_temperature
12-22-2022 26.1428
12-15-2022 28.5714
12-08-2022 32.0000
12-01-2022 32.2500

The last row is the average of 4 days ending in 12-01-2022 (i.e. 33,31,33,32)

This query is meant to run daily - so the week_ending_in_date will also change daily. How should I approach this with a sql query?

Upvotes: 0

Views: 132

Answers (1)

lemon
lemon

Reputation: 15492

You can solve this problem following these steps:

  • convert the format of your "date" field from VARCHAR to DATE
  • generate your partitions on your dates for each 7 days
  • aggregate for each partition, by getting the maximum date and the average temperature

In order to generate your partition, you use:

  • the ROW_NUMBER window function, if your date values are always consecutives (every day a new temperature will appear)
  • the UNIX_TIMESTAMP function, to normalize the date to actual weeks, if your date values are not always consecutives (you forget to record temperature values some days)

First Solution (fiddle):

WITH cte AS (
    SELECT STR_TO_DATE(date_, '%m-%d-%Y') AS date_as_date,  
           temperature 
    FROM tab
), partitioned_dates AS (
    SELECT *, (ROW_NUMBER() OVER(ORDER BY date_as_date DESC) -1) DIV 7 AS rn
    FROM cte
)
SELECT MAX(date_as_date) AS week_ending_in_date,
       AVG(temperature)  AS avg_temperature
FROM partitioned_dates
GROUP BY rn

Second Solution (fiddle):

WITH cte AS (
    SELECT STR_TO_DATE(date_, '%m-%d-%Y')                 AS date_as_date,
           UNIX_TIMESTAMP(STR_TO_DATE(date_, '%m-%d-%Y')) AS integer_date,  
           temperature 
    FROM tab
), partitioned_dates AS (
    SELECT *, (MAX(integer_date) OVER(ORDER BY integer_date DESC) - integer_date) DIV (60*60*24*7) AS rn
    FROM cte
)
SELECT MAX(date_as_date) AS week_ending_in_date,
       AVG(temperature)  AS avg_temperature
FROM partitioned_dates
GROUP BY rn

Extra tips:

  • Use MySQL standard for dates: you're storing in the %mm-%dd-%yyyy format, MySQL standard for dates is %dd-%mm-%yyyy.
  • Don't store dates as VARCHAR. MySQL has the DATE datetype, that allows you to use a wide toolkit of functions at your disposal.
  • Don't use names that match reserved MySQL keywords to name your tables and fields. Specifically, the "date" field name can conflict with the DATE reserved word for its homonimous datatype, you may run into syntax errors.

Following MySQL rules will considerably help you with manipulation and save yourself a lot of time.

Upvotes: 1

Related Questions