Reputation: 5644
I need to do something like:
SELECT value_column1
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
Except in addition to value_column1
, I also need to retrieve a moving average of the previous 20 values of value_column1
.
Standard SQL is preferred, but I will use MySQL extensions if necessary.
Upvotes: 6
Views: 19513
Reputation: 151
In mysql 8 window function frame can be used to obtain the averages.
SELECT value_column1, AVG(value_column1) OVER (ORDER BY datetime_column1 ROWS 19 PRECEDING) as ma
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
This calculates the average of the current row and 19 preceding rows.
Upvotes: 0
Reputation: 4563
I realize that this answer is about 7 years too late. I had a similar requirement and thought I'd share my solution in case it's useful to someone else.
There are some MySQL extensions for technical analysis that include a simple moving average. They're really easy to install and use: https://github.com/mysqludf/lib_mysqludf_ta#readme
Once you've installed the UDF (per instructions in the README), you can include a simple moving average in a select statement like this:
SELECT TA_SMA(value_column1, 20) AS sma_20 FROM table1 ORDER BY datetime_column1
Upvotes: 2
Reputation: 11
My solution adds a row number in table. The following example code may help:
set @MA_period=5;
select id1,tmp1.date_time,tmp1.c,avg(tmp2.c) from
(select @b:=@b+1 as id1,date_time,c from websource.EURUSD,(select @b:=0) bb order by date_time asc) tmp1,
(select @a:=@a+1 as id2,date_time,c from websource.EURUSD,(select @a:=0) aa order by date_time asc) tmp2
where id1>@MA_period and id1>=id2 and id2>(id1-@MA_period)
group by id1
order by id1 asc,id2 asc
Upvotes: 1
Reputation: 9
In my experience, Mysql as of 5.5.x tends not to use indexes on dependent selects, whether a subquery or join. This can have a very significant impact on performance where the dependent select criteria change on every row.
Moving average is an example of a query which falls into this category. Execution time may increase with the square of the rows. To avoid this, chose a database engine which can perform indexed look-ups on dependent selects. I find postgres works effectively for this problem.
Upvotes: 0
Reputation: 238088
Tom H's approach will work. You can simplify it like this if you have an identity column:
SELECT T1.id, T1.value_column1, avg(T2.value_column1)
FROM table1 T1
INNER JOIN table1 T2 ON T2.Id BETWEEN T1.Id-19 AND T1.Id
Upvotes: 2
Reputation: 4137
When I had a similar problem, I ended up using temp tables for a variety of reasons, but it made this a lot easier! What I did looks very similar to what you're doing, as far as the schema goes.
Make the schema something like ID identity, start_date, end_date, value. When you select, do a subselect avg of the previous 20 based on the identity ID.
Only do this if you find yourself already using temp tables for other reasons though (I hit the same rows over and over for different metrics, so it was helpful to have the small dataset).
Upvotes: 1
Reputation: 47454
This is just off the top of my head, and I'm on the way out the door, so it's untested. I also can't imagine that it would perform very well on any kind of large data set. I did confirm that it at least runs without an error though. :)
SELECT
value_column1,
(
SELECT
AVG(value_column1) AS moving_average
FROM
Table1 T2
WHERE
(
SELECT
COUNT(*)
FROM
Table1 T3
WHERE
date_column1 BETWEEN T2.date_column1 AND T1.date_column1
) BETWEEN 1 AND 20
)
FROM
Table1 T1
Upvotes: 18