colin_froggatt
colin_froggatt

Reputation: 147

SQLite: merge rows in same table using older record to fill blanks in newer record

I have a SQLite data table that stores records with composite key of <id,rdate>. New records are added for each id on different dates, but new records often don't complete all columns. Incomplete columns may be null or whitespace e.g.

   | id  | rdate      | data1   | data2   |
   ----------------------------------------
0  | 1   | 01/01/2009 | foo     | boo     |
1  | 1   | 04/01/2010 | foo1    | bar1    |
2  | 1   | 08/01/2010 | fooX    | <null>  |
3  | 2   | 01/01/2010 | foo2    | bar2    |
4  | 2   | 04/01/2010 |         |         |
5  | 3   | 01/01/2010 | foo3    | bar3    |
   ----------------------------------------

I want to periodically update records, with the same id, to fill blank columns in the most recent record (by rdate) with data from the previous record. In the example above, data from row 1 is used to fill blank columns in row 2

So the table, after running the query, would look like this:

   | id | rdate      | data1   | data2   |
   ----------------------------------------
0  | 1   | 01/01/2009 | foo     | boo     |
1  | 1   | 04/01/2010 | foo1    | bar1    |
2  | 1   | 08/01/2010 | fooX    | bar1    |
3  | 2   | 01/01/2010 | foo2    | bar2    |
4  | 2   | 04/01/2010 | foo2    | bar2    |
5  | 3   | 01/01/2010 | foo3    | bar3    |
   ----------------------------------------

I have tried to construct a query to do this but I am struggling with how to approach this, or even if it can be done.

This question looks at merging records but from a de-dup perspective. I haven't been able to find anything doing what I need. COALESCE looks promising, but I haven't been able to figure out how to build the query to use it.

Help and suggestions very much appreciated.

Upvotes: 1

Views: 168

Answers (2)

forpas
forpas

Reputation: 164089

For each of the columns data1 and data2 use a correlated subquery that returns the last previous non null value in that column:

UPDATE tablename AS t1
SET data1 = COALESCE(
              NULLIF(TRIM(t1.data1), ''),
              (SELECT t2.data1 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data1), '') IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            ),
    data2 = COALESCE(
              NULLIF(TRIM(t1.data2), ''),
              (SELECT t2.data2 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data2), '') IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            )            
WHERE NULLIF(TRIM(t1.data1), '') IS NULL OR NULLIF(TRIM(t1.data2), '') IS NULL

See the demo.

But it would be better to update the table so every empty value is replaced with null:

UPDATE tablename
SET data1 = NULLIF(TRIM(data1), ''),
    data2 = NULLIF(TRIM(data2), '')
WHERE TRIM(data1) = '' OR TRIM(data2) = ''

and then the code can be simplified:

UPDATE tablename AS t1
SET data1 = COALESCE(
              t1.data1,
              (SELECT t2.data1 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data1 IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            ),
    data2 = COALESCE(
              t1.data2,
              (SELECT t2.data2 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data2 IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            )            
WHERE data1 IS NULL OR data2 IS NULL

See the demo.

Results:

id rdate data1 data2
1 2009-01-01 foo boo
1 2010-01-04 foo1 bar1
1 2010-01-08 fooX bar1
2 2010-01-01 foo2 bar2
2 2010-01-04 foo2 bar2
3 2010-01-01 foo3 bar3

Note that your dates as they are in the sample data are not comparable.
Change them to the format 'YYYY-MM-DD'.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269593

You can use update with two correlated subqueries:

update t
    set data1 = coalesce(data1,
                         (select t2.data1
                          from t t2
                          where t2.id = t.id and
                                t2.rdate < t.rdate and
                                t2.data1 is not null
                         )
                        ),
        data2 = coalesce(data1,
                         (select t2.data2
                          from t t2
                          where t2.id = t.id and
                                t2.rdate < t.rdate and
                                t2.data2 is not null
                         )
                        )
    where data1 is null or data2 is null;

To handle all spaces, you have to modify lots little things in the query:

UPDATE tablename as t1
    SET data1 = (CASE WHEN TRIM(t1.data1) <> ''
                      THEN t1.data1
                      ELSE (SELECT t2.data1
                            FROM tablename t2 
                            WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND trim(t2.data1) <> '' 
                            ORDER BY t2.rdate DESC
                            LIMIT 1
                           )
                 END),
        data2 = (CASE WHEN TRIM(t1.data2) <> ''
                      THEN t1.data2
                      ELSE (SELECT t2.data2
                            FROM tablename t2 
                            WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND trim(t2.data2) <> ''
                            ORDER BY t2.rdate DESC
                            LIMIT 1
                           )
                 END)            
    WHERE data1 IS NULL OR data2 IS NULL or trim(data1) = '' or trim(data2) = '';

Upvotes: 1

Related Questions