No Way
No Way

Reputation: 193

Fill the gaps of NULL-s in a table with average values

I have a table with fields (id,letter,date) and some data in it:

1 A 2012-01-01
2 B NULL
3 C NULL
4 D 2012-01-15

I want to fill the NULL values with an average date of nearest non-NULL values. Like that:

1 A 2012-01-01
2 B 2012-01-08
3 C 2012-01-08
4 D 2012-01-15

OR, maybe, even like that:

1 A 2012-01-01
2 B 2012-01-08
3 C 2012-01-11
4 D 2012-01-15

Both variants are great. Is there a simple way to implement it in MySQL?

Thanks in advance

UPD Table is pretty large, about 700.000 records, and about 50.000 gaps like described ones.

UPD2 A bit cleaner: table may be like that:

1 A 2012-01-01
2 B NULL
3 C NULL
4 D 2012-01-15
5 E NULL
6 F 2012-01-17
7 G NULL
8 H NULL
9 I 2012-01-20

The result expected is like:

1 A 2012-01-01
2 B **2012-01-08**
3 C **2012-01-08**
4 D 2012-01-15
5 E **2012-01-16**
6 F 2012-01-17
7 G **2012-01-18**
8 H **2012-01-18**
9 I 2012-01-20

(Asterisks are to note the changed values). Thanks

UPD3 THANKS EVERYONE. But I will just do that in another way, calculating date with a simple formula: needed_date = [(max(date)-min(date))/(max(id)-min(id)]*(my_ID-min(id)) + min(date)

Upvotes: 0

Views: 260

Answers (2)

RolandoMySQLDBA
RolandoMySQLDBA

Reputation: 44363

QUERY #1

SELECT id,letter,IFNULL(date,dt) date FROM mytable,
(SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt
FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate)
- UNIX_TIMESTAMP(mindate) secdiff
FROM (SELECT MIN(date) mindate FROM mytable) N,
(SELECT MAX(date) maxdate FROM mytable) X) AA) A;

SAMPLE DATA

mysql> DROP TABLE IF EXISTS mytable;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE mytable
    -> (
    ->    id int not null auto_increment,
    ->    letter char(1),
    ->    `date` date,
    ->    primary key (id)
    -> );
Query OK, 0 rows affected (0.07 sec)

mysql> INSERT INTO mytable (letter,date) VALUES
    -> ('A','2012-01-01'),('B',NULL),('C',NULL),('D','2012-01-15');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM mytable;
+----+--------+------------+
| id | letter | date       |
+----+--------+------------+
|  1 | A      | 2012-01-01 |
|  2 | B      | NULL       |
|  3 | C      | NULL       |
|  4 | D      | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.00 sec)

mysql>

QUERY #1 EXECUTED

mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable,
    -> (SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt
    -> FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate)
    -> - UNIX_TIMESTAMP(mindate) secdiff
    -> FROM (SELECT MIN(date) mindate FROM mytable) N,
    -> (SELECT MAX(date) maxdate FROM mytable) X) AA) A;
+----+--------+------------+
| id | letter | date       |
+----+--------+------------+
|  1 | A      | 2012-01-01 |
|  2 | B      | 2012-01-08 |
|  3 | C      | 2012-01-08 |
|  4 | D      | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.00 sec)

mysql>

QUERY #2 (Cleaner Version)

This query uses the average of UNIX Timestamps. If all the dates are NULL, it uses today's date:

SELECT id,letter,IFNULL(date,dt) date FROM mytable,
(
    SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM
    (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))))
    avgdt FROM mytable) AA,
    (SELECT COUNT(date) K FROM mytable) BB
) A;

QUERY #2 EXECUTED

mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable,
    -> (
    ->     SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM
    ->     (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))))
    ->     avgdt FROM mytable) AA,
    ->     (SELECT COUNT(date) K FROM mytable) BB
    -> ) A;
+----+--------+------------+
| id | letter | date       |
+----+--------+------------+
|  1 | A      | 2012-01-01 |
|  2 | B      | 2012-01-08 |
|  3 | C      | 2012-01-08 |
|  4 | D      | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.05 sec)

mysql>

Give it a Try !!!

Upvotes: 1

Mike Dinescu
Mike Dinescu

Reputation: 55750

Assuming you have a table called T like this:

CREATE TABLE T(
    id INT,
    time DATETIME
);

The following query would give you boundaries for each NULL record:

SELECT T.Id
     , MAX(T1.Time) as MinDate
     , MIN(T2.Time) as MaxDate     
  FROM T
INNER JOIN T T1 ON T1.Id < T.Id
               AND T.time IS NULL 
               AND NOT T1.time IS NULL
INNER JOIN T T2 ON T2.id > T.id
               AND T.time IS NULL
               AND NOT T2.time IS NULL
GROUP BY Id

The output would be:

Id  MinDate     MaxDate
2   2012-01-01  2012-01-15
3   2012-01-01  2012-01-15

So the next step would be to do an update using the values from this result-set to update the NULLs with an average for instance..

UPDATE T
INNER JOIN 
(
   SELECT T.Id, MAX(T1.Time) as MinTime, MIN(T2.Time) as MaxTime
     FROM T
   INNER JOIN T T1 ON T1.id < T.id
                 AND T.time IS NULL 
                 AND NOT T1.time IS NULL
   INNER JOIN T T2 ON T2.id > T.id
                 AND T.time IS NULL
                 AND NOT T2.time IS NULL    
   GROUP BY T.ID) T3
 ON T3.id = T.id  
 SET T.time = FROM_UNIXTIME((UNIX_TIMESTAMP(T3.MinTime) + UNIX_TIMESTAMP(T3.MaxTime)) / 2)
 WHERE T.time IS NULL

Working SQLFiddle Here

Upvotes: 1

Related Questions