Reputation: 21
I'm looking to create a column in a MySQL table that calculates the running variance (or standard deviation, whichever would be easiest) of the last five values of another column. I am currently ordering the data using three variables: ID, date, and a counter (the counter ascends from 1 for each ID-date pairing). So each time a new ID-date combination begins, I'd like this new variance column to reset as well. Here is a little sample of what I'm going for:
+----+--------+---------+-------+--------------------------+ | ID | date | counter | value | var(value) | +----+--------+---------+-------+--------------------------+ | 11 | 1/1/13 | 1 | 2.1 | var(2.1) | | 11 | 1/1/13 | 2 | 2.4 | var(2.1,2.4) | | 11 | 1/1/13 | 3 | 2.3 | var(2.1,2.4,2.3) | | 11 | 1/1/13 | 4 | 2.5 | var(2.1,2.4,2.3,2.5) | | 11 | 1/1/13 | 5 | 2.3 | var(2.1,2.4,2.3,2.5,2.3) | | 11 | 1/1/13 | 6 | 2.5 | var(2.4,2.3,2.5,2.3,2.5) | | 11 | 1/3/13 | 1 | 5.4 | var(5.4) | | 11 | 1/3/13 | 2 | 4.3 | var(5.4,4.3) | | 11 | 1/3/13 | 3 | 3.4 | var(5.4,4.3,3.4) | | 11 | 1/3/13 | 4 | 2.1 | var(5.4,4.3,3.4,2.1) | +----+--------+---------+-------+--------------------------+
Does anyone know how this can be done in MySQL? I haven't found a solution to anything question similar to this one out there.
Thanks so much!
Upvotes: 1
Views: 140
Reputation: 1270391
The variance is the average of the sum of the squares of the differences between each value and the average. So, you can do this, with a lot of joins and arithmetic. Something like this:
select t.*,
(case when t1.date is null then 0
when t2.date is null
then (pow(t.value - (t.value + t1.value) / 2, 2) +
pow(t1.value - (t.value + t1.value) / 2, 2))/2
when t3.date is null
then (pow(t.value - (t.value + t1.value + t2.value) / 3, 2) +
pow(t1.value - (t.value + t1.value + t2.value) / 3, 2) +
pow(t2.value - (t.value + t1.value + t2.value) / 3, 2)
) / 3
when t4.date is null
then (pow(t.value - (t.value + t1.value + t2.value + t3.value) / 4, 2) +
pow(t1.value - (t.value + t1.value + t2.value + t3.value) / 4, 2) +
pow(t2.value - (t.value + t1.value + t2.value + t3.value) / 4, 2) +
pow(t3.value - (t.value + t1.value + t2.value + t3.value) / 4, 2)
) / 4
else (pow(t.value - (t.value + t1.value + t2.value + t3.value + t4.value) / 5, 2) +
pow(t1.value - (t.value + t1.value + t2.value + t3.value + t4.value) / 5, 2) +
pow(t2.value - (t.value + t1.value + t2.value + t3.value + t4.value) / 5, 2) +
pow(t3.value - (t.value + t1.value + t2.value + t3.value + t4.value) / 5, 2) +
pow(t4.value - (t.value + t1.value + t2.value + t3.value + t4.value) / 5, 2)
) / 5
end) as var
from t left outer join
t t1
on t.date = t1.date and t.counter = t1.counter + 1 left outer join
t t2
on t.date = t2.date and t.counter = t2.counter + 2 left outer join
t t3
on t.date = t3.date and t.counter = t3.counter + 3 left outer join
t t4
on t.date = t4.date and t.counter = t4.counter + 4;
Upvotes: 0
Reputation: 125925
You can group a self-join using a suitable aggregation function, such as VARIANCE()
:
SELECT a.*, VARIANCE(b.value)
FROM my_table a
JOIN my_table b ON b.ID = a.ID
AND b.date = a.date
AND b.counter <= a.counter
GROUP BY a.ID, a.date, a.counter, a.value
See it on sqlfiddle.
Upvotes: 2