Reputation: 193
I have a table with fields (id,letter,date)
and some data in it:
1 A 2012-01-01
2 B NULL
3 C NULL
4 D 2012-01-15
I want to fill the NULL values with an average date of nearest non-NULL values. Like that:
1 A 2012-01-01
2 B 2012-01-08
3 C 2012-01-08
4 D 2012-01-15
OR, maybe, even like that:
1 A 2012-01-01
2 B 2012-01-08
3 C 2012-01-11
4 D 2012-01-15
Both variants are great. Is there a simple way to implement it in MySQL?
Thanks in advance
UPD Table is pretty large, about 700.000 records, and about 50.000 gaps like described ones.
UPD2 A bit cleaner: table may be like that:
1 A 2012-01-01
2 B NULL
3 C NULL
4 D 2012-01-15
5 E NULL
6 F 2012-01-17
7 G NULL
8 H NULL
9 I 2012-01-20
The result expected is like:
1 A 2012-01-01
2 B **2012-01-08**
3 C **2012-01-08**
4 D 2012-01-15
5 E **2012-01-16**
6 F 2012-01-17
7 G **2012-01-18**
8 H **2012-01-18**
9 I 2012-01-20
(Asterisks are to note the changed values). Thanks
UPD3 THANKS EVERYONE. But I will just do that in another way, calculating date with a simple formula: needed_date = [(max(date)-min(date))/(max(id)-min(id)]*(my_ID-min(id)) + min(date)
Upvotes: 0
Views: 260
Reputation: 44363
SELECT id,letter,IFNULL(date,dt) date FROM mytable,
(SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt
FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate)
- UNIX_TIMESTAMP(mindate) secdiff
FROM (SELECT MIN(date) mindate FROM mytable) N,
(SELECT MAX(date) maxdate FROM mytable) X) AA) A;
mysql> DROP TABLE IF EXISTS mytable;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE mytable
-> (
-> id int not null auto_increment,
-> letter char(1),
-> `date` date,
-> primary key (id)
-> );
Query OK, 0 rows affected (0.07 sec)
mysql> INSERT INTO mytable (letter,date) VALUES
-> ('A','2012-01-01'),('B',NULL),('C',NULL),('D','2012-01-15');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM mytable;
+----+--------+------------+
| id | letter | date |
+----+--------+------------+
| 1 | A | 2012-01-01 |
| 2 | B | NULL |
| 3 | C | NULL |
| 4 | D | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.00 sec)
mysql>
mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable,
-> (SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt
-> FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate)
-> - UNIX_TIMESTAMP(mindate) secdiff
-> FROM (SELECT MIN(date) mindate FROM mytable) N,
-> (SELECT MAX(date) maxdate FROM mytable) X) AA) A;
+----+--------+------------+
| id | letter | date |
+----+--------+------------+
| 1 | A | 2012-01-01 |
| 2 | B | 2012-01-08 |
| 3 | C | 2012-01-08 |
| 4 | D | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.00 sec)
mysql>
This query uses the average of UNIX Timestamps. If all the dates are NULL, it uses today's date:
SELECT id,letter,IFNULL(date,dt) date FROM mytable,
(
SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM
(SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))))
avgdt FROM mytable) AA,
(SELECT COUNT(date) K FROM mytable) BB
) A;
mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable,
-> (
-> SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM
-> (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))))
-> avgdt FROM mytable) AA,
-> (SELECT COUNT(date) K FROM mytable) BB
-> ) A;
+----+--------+------------+
| id | letter | date |
+----+--------+------------+
| 1 | A | 2012-01-01 |
| 2 | B | 2012-01-08 |
| 3 | C | 2012-01-08 |
| 4 | D | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.05 sec)
mysql>
Upvotes: 1
Reputation: 55750
Assuming you have a table called T
like this:
CREATE TABLE T(
id INT,
time DATETIME
);
The following query would give you boundaries for each NULL record:
SELECT T.Id
, MAX(T1.Time) as MinDate
, MIN(T2.Time) as MaxDate
FROM T
INNER JOIN T T1 ON T1.Id < T.Id
AND T.time IS NULL
AND NOT T1.time IS NULL
INNER JOIN T T2 ON T2.id > T.id
AND T.time IS NULL
AND NOT T2.time IS NULL
GROUP BY Id
The output would be:
Id MinDate MaxDate
2 2012-01-01 2012-01-15
3 2012-01-01 2012-01-15
So the next step would be to do an update using the values from this result-set to update the NULLs with an average for instance..
UPDATE T
INNER JOIN
(
SELECT T.Id, MAX(T1.Time) as MinTime, MIN(T2.Time) as MaxTime
FROM T
INNER JOIN T T1 ON T1.id < T.id
AND T.time IS NULL
AND NOT T1.time IS NULL
INNER JOIN T T2 ON T2.id > T.id
AND T.time IS NULL
AND NOT T2.time IS NULL
GROUP BY T.ID) T3
ON T3.id = T.id
SET T.time = FROM_UNIXTIME((UNIX_TIMESTAMP(T3.MinTime) + UNIX_TIMESTAMP(T3.MaxTime)) / 2)
WHERE T.time IS NULL
Upvotes: 1