Reputation: 6252
I have data which is formed like this:
+----------+-------+-------+
| DAY | VALUE | Name |
+----------+-------+-------+
| 01/01/14 | 1030 | BOB
| 01/02/14 | 1020 | BOB
| 01/03/14 | 1080 | BOB
| 01/04/14 | 1090 | BOB
| 01/05/14 | 1040 | BOB
| 01/08/14 | 1030 | BOB
| 01/11/14 | 4030 | BOB
| 01/12/14 | 5000 | BOB
| 01/13/14 | 6000 | BOB
| 01/14/14 | 1096 | BOB
| 01/14/14 | 1200 | MIKE
| 01/15/14 | 1040 | MIKE
| 01/16/14 | 1600 | MIKE
| 01/17/14 | 1070 | MIKE
| 01/18/14 | 1340 | MIKE
| 01/19/14 | 1060 | MIKE
| 01/01/14 | 6000 | JANE
| 01/02/14 | 1700 | JANE
| 01/03/14 | 1070 | JANE
| 01/04/14 | 8000 | JANE
+----------+-------+------+
For each name there needs to be a row for the dates between 01/01/14 to 02/01/14 (1 month). As you can see Bob, Mike, and Jane (although in my real database there are thousands of names) are all missing dates between this time period. I would like to somehow insert the missing rows by interpolation of some sort. For example Bob is missing 01/06/14 and 01/07/14. I would like it to interpolate by adding these two dates and then the values to be the average of the two field between so these two missing fields would both have the value ((1040+1030)/2) = 1035. If there is no data before like for MIKE (starts at 01/14/14) I would like all the new rows to have 01/14/14 value now. I have tried various different techniques such as using coalesce command, cursors, but can't get it to work. Also I am not set on having these EXACT values, if there is some sort of math library which can interpolate I would be open to this as well. Thanks.
Upvotes: 1
Views: 936
Reputation: 1270513
You have two problems, generating the rows and interpolating the values. You can generate the rows with this SQL:
select d.day, n.name, t.value
from (select distinct name from table t) n cross join
(select distinct day from table t) d left outer join
table t
on t.name = n.name and t.day = d.day;
Doing the interpolation is troublesome. You can do this using variables and multiple sorting. Here is logic:
select day, name, value, prev_value,
@value as next_value,
@value := if(@name = name and value is not null, value, @value),
@name := name
from (select d.day, n.name, t.value,
@value as prev_value,
@value := if(@name = name and value is not null, value, @value),
@name := name
from (select distinct name from table t) n cross join
(select distinct day from table t) d left outer join
table t
on t.name = n.name and t.day = d.day cross join
(select @name := '', @value := NULL) vars
order by n.name, d.day
) t cross join
(select @name := '', @value := NULL) vars
order by n.name, d.day desc;
This will probably work for you, but it is depending on MySQL evaluating the expressions in order in each select
(for the assignment of variables). You can make the syntax more complicated to fix this, but that would hide the logic. You can now implement the logic that you want:
select day, name,
(case when value is not null then value
when prev_value is not null and next_value is not null
then (prev_value + next_value) / 2
when prev_value is null then next_value
else prev_value
end) as value
from (<previous query here>) t;
Upvotes: 1