Reputation: 147
I have a SQLite data table that stores records with composite key of <id,rdate>. New records are added for each id on different dates, but new records often don't complete all columns. Incomplete columns may be null or whitespace e.g.
| id | rdate | data1 | data2 |
----------------------------------------
0 | 1 | 01/01/2009 | foo | boo |
1 | 1 | 04/01/2010 | foo1 | bar1 |
2 | 1 | 08/01/2010 | fooX | <null> |
3 | 2 | 01/01/2010 | foo2 | bar2 |
4 | 2 | 04/01/2010 | | |
5 | 3 | 01/01/2010 | foo3 | bar3 |
----------------------------------------
I want to periodically update records, with the same id, to fill blank columns in the most recent record (by rdate) with data from the previous record. In the example above, data from row 1
is used to fill blank columns in row 2
So the table, after running the query, would look like this:
| id | rdate | data1 | data2 |
----------------------------------------
0 | 1 | 01/01/2009 | foo | boo |
1 | 1 | 04/01/2010 | foo1 | bar1 |
2 | 1 | 08/01/2010 | fooX | bar1 |
3 | 2 | 01/01/2010 | foo2 | bar2 |
4 | 2 | 04/01/2010 | foo2 | bar2 |
5 | 3 | 01/01/2010 | foo3 | bar3 |
----------------------------------------
I have tried to construct a query to do this but I am struggling with how to approach this, or even if it can be done.
This question looks at merging records but from a de-dup perspective. I haven't been able to find anything doing what I need. COALESCE
looks promising, but I haven't been able to figure out how to build the query to use it.
Help and suggestions very much appreciated.
Upvotes: 1
Views: 168
Reputation: 164089
For each of the columns data1
and data2
use a correlated subquery that returns the last previous non null value in that column:
UPDATE tablename AS t1
SET data1 = COALESCE(
NULLIF(TRIM(t1.data1), ''),
(SELECT t2.data1 FROM tablename t2
WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data1), '') IS NOT NULL
ORDER BY t2.rdate DESC LIMIT 1)
),
data2 = COALESCE(
NULLIF(TRIM(t1.data2), ''),
(SELECT t2.data2 FROM tablename t2
WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data2), '') IS NOT NULL
ORDER BY t2.rdate DESC LIMIT 1)
)
WHERE NULLIF(TRIM(t1.data1), '') IS NULL OR NULLIF(TRIM(t1.data2), '') IS NULL
See the demo.
But it would be better to update the table so every empty value is replaced with null
:
UPDATE tablename
SET data1 = NULLIF(TRIM(data1), ''),
data2 = NULLIF(TRIM(data2), '')
WHERE TRIM(data1) = '' OR TRIM(data2) = ''
and then the code can be simplified:
UPDATE tablename AS t1
SET data1 = COALESCE(
t1.data1,
(SELECT t2.data1 FROM tablename t2
WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data1 IS NOT NULL
ORDER BY t2.rdate DESC LIMIT 1)
),
data2 = COALESCE(
t1.data2,
(SELECT t2.data2 FROM tablename t2
WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data2 IS NOT NULL
ORDER BY t2.rdate DESC LIMIT 1)
)
WHERE data1 IS NULL OR data2 IS NULL
See the demo.
Results:
id | rdate | data1 | data2 |
---|---|---|---|
1 | 2009-01-01 | foo | boo |
1 | 2010-01-04 | foo1 | bar1 |
1 | 2010-01-08 | fooX | bar1 |
2 | 2010-01-01 | foo2 | bar2 |
2 | 2010-01-04 | foo2 | bar2 |
3 | 2010-01-01 | foo3 | bar3 |
Note that your dates as they are in the sample data are not comparable.
Change them to the format 'YYYY-MM-DD'
.
Upvotes: 1
Reputation: 1269593
You can use update
with two correlated subqueries:
update t
set data1 = coalesce(data1,
(select t2.data1
from t t2
where t2.id = t.id and
t2.rdate < t.rdate and
t2.data1 is not null
)
),
data2 = coalesce(data1,
(select t2.data2
from t t2
where t2.id = t.id and
t2.rdate < t.rdate and
t2.data2 is not null
)
)
where data1 is null or data2 is null;
To handle all spaces, you have to modify lots little things in the query:
UPDATE tablename as t1
SET data1 = (CASE WHEN TRIM(t1.data1) <> ''
THEN t1.data1
ELSE (SELECT t2.data1
FROM tablename t2
WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND trim(t2.data1) <> ''
ORDER BY t2.rdate DESC
LIMIT 1
)
END),
data2 = (CASE WHEN TRIM(t1.data2) <> ''
THEN t1.data2
ELSE (SELECT t2.data2
FROM tablename t2
WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND trim(t2.data2) <> ''
ORDER BY t2.rdate DESC
LIMIT 1
)
END)
WHERE data1 IS NULL OR data2 IS NULL or trim(data1) = '' or trim(data2) = '';
Upvotes: 1