tread
tread

Reputation: 11098

Calculating a Moving Average MySQL?

Good Day,

I am using the following code to calculate the 9 Day Moving average.

SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9

But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.

So I need to calculate the SUM from the returned select, rather than calculate it straight.

IE. Select the SUM from the SELECT...

Now how would I go about doing this and is it very costly or is there a better way?

Upvotes: 19

Views: 42674

Answers (6)

sag
sag

Reputation: 186

Moving MEDIAN (if someone will need it)

SELECT date,
   (SELECT AVG(close) AS median
    FROM
        (SELECT close, 
            ROW_NUMBER() OVER(ORDER BY close) AS r,
            COUNT(close) OVER() AS c
        FROM tbl
        WHERE datediff(t.date, date) 
                BETWEEN 1 AND 9
        ) AS t2

    WHERE t2.r BETWEEN FLOOR(t2.c/2)+MOD(t2.c,2) 
                    AND FLOOR(t2.c/2)+1
   ) AS mvgMedian
FROM tbl t
GROUP BY date
ORDER BY date

Upvotes: 0

olivier dufour
olivier dufour

Reputation: 332

an other technique is to do a table:

CREATE TABLE `tinyint_asc` (
 `value` tinyint(3) unsigned NOT NULL default '0',
 PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);

After you can used it like that:

select 
    date_add(tbl.date, interval tinyint_asc.value day) as mydate, 
    count(*),
    sum(myvalue)
from tbl inner 
    join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
    group by mydate

Upvotes: 1

Lukas Eder
Lukas Eder

Reputation: 220832

Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:

SELECT
  date,
  close,
  AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC

For example:

WITH t (date, `close`) AS (
  SELECT DATE '2020-01-01', 50 UNION ALL
  SELECT DATE '2020-01-03', 54 UNION ALL
  SELECT DATE '2020-01-05', 51 UNION ALL
  SELECT DATE '2020-01-12', 49 UNION ALL
  SELECT DATE '2020-01-13', 59 UNION ALL
  SELECT DATE '2020-01-15', 30 UNION ALL
  SELECT DATE '2020-01-17', 35 UNION ALL
  SELECT DATE '2020-01-18', 39 UNION ALL
  SELECT DATE '2020-01-19', 47 UNION ALL
  SELECT DATE '2020-01-26', 50
)
SELECT
  date,
  `close`,
  COUNT(*) OVER w AS c,
  SUM(`close`) OVER w AS s,
  AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC

Leading to:

date      |close|c|s  |a      |
----------|-----|-|---|-------|
2020-01-26|   50|1| 50|50.0000|
2020-01-19|   47|2| 97|48.5000|
2020-01-18|   39|3|136|45.3333|
2020-01-17|   35|4|171|42.7500|
2020-01-15|   30|4|151|37.7500|
2020-01-13|   59|5|210|42.0000|
2020-01-12|   49|6|259|43.1667|
2020-01-05|   51|3|159|53.0000|
2020-01-03|   54|3|154|51.3333|
2020-01-01|   50|3|155|51.6667|

Upvotes: 13

bebbo
bebbo

Reputation: 2939

This query is fast:

select date, name_id,
case @i when name_id then @i:=name_id else (@i:=name_id)
and (@n:=0)
and (@a0:=0) and (@a1:=0) and (@a2:=0) and (@a3:=0) and (@a4:=0) and (@a5:=0) and (@a6:=0) and (@a7:=0) and (@a8:=0)
end as a,
case @n when 9 then @n:=9 else @n:=@n+1 end as n,
@a0:=@a1,@a1:=@a2,@a2:=@a3,@a3:=@a4,@a4:=@a5,@a5:=@a6,@a6:=@a7,@a7:=@a8,@a8:=close,
(@a0+@a1+@a2+@a3+@a4+@a5+@a6+@a7+@a8)/@n as av
from tbl,
(select @i:=0, @n:=0,
        @a0:=0, @a1:=0, @a2:=0, @a3:=0, @a4:=0, @a5:=0, @a6:=0, @a7:=0, @a8:=0) a
where name_id=2
order by name_id, date

If you need an average over 50 or 100 values, it's tedious to write, but worth the effort. The speed is close to the ordered select.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269643

If you want the moving average for each date, then try this:

SELECT date, SUM(close),
       (select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
       ) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
      name_id = 2
GROUP BY date
ORDER BY date DESC

It uses a correlated subquery to calculate the average of 9 values.

Upvotes: 27

Akash
Akash

Reputation: 5012

Use something like

SELECT 
  sum(close) as sum,
  avg(close) as average
FROM (
    SELECT 
      (close)
    FROM 
      tbl
    WHERE 
      date <= '2002-07-05'
      AND name_id = 2
    ORDER BY 
      date DESC
    LIMIT 9 ) temp

The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.

The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present

Upvotes: 5

Related Questions