Gera Kuiv
Gera Kuiv

Reputation: 31

SQL insert values from previous date if specific date information is missing

I have got the following table.

date2                     Group    number
2020-28-05 00:00:00         A        55
2020-28-05 00:00:00         B        1.09
2020-28-05 00:00:00         C        1.8
2020-29-05 00:00:00         A        68
2020-29-05 00:00:00         B        1.9
2020-29-05 00:00:00         C        1.19
2020-01-06 00:00:00         A        10
2020-01-06 00:00:00         B        15
2020-01-06 00:00:00         C        0.88
2020-02-06 00:00:00         A        22
2020-02-06 00:00:00         B        15
2020-02-06 00:00:00         C        13
2020-03-06 00:00:00         A        66
2020-03-06 00:00:00         B        88
2020-03-06 00:00:00         C        99

As you can see between dates 2020-30-05 and 2020-31-05 are missing in this table. So it is necessary to fill these dates with 2020-29-05 information grouped by GROUP. As a result the final output should be like that:

date2                     Group    number
2020-28-05 00:00:00         A        55
2020-28-05 00:00:00         B        1.09
2020-28-05 00:00:00         C        1.8
2020-29-05 00:00:00         A        68
2020-29-05 00:00:00         B        1.9
2020-29-05 00:00:00         C        1.19
2020-30-05 00:00:00         A        68
2020-30-05 00:00:00         B        1.9
2020-30-05 00:00:00         C        1.19
2020-31-05 00:00:00         A        68
2020-31-05 00:00:00         B        1.9
2020-31-05 00:00:00         C        1.19
2020-01-06 00:00:00         A        10
2020-01-06 00:00:00         B        15
2020-01-06 00:00:00         C        0.88
2020-02-06 00:00:00         A        22
2020-02-06 00:00:00         B        15
2020-02-06 00:00:00         C        13
2020-03-06 00:00:00         A        66
2020-03-06 00:00:00         B        88
2020-03-06 00:00:00         C        99

I tried to do in the following way: create a temporary table (table B) with only dates for period 2020-28-05 till 2020-03-06 and then use left merge, thus making these new dates as null (in order to then insert a CASE when null, so fill in last_value). However, it does not work, because when merging I got nulls only for one date (but should be 3 times one date(because of groups). This is only part of the larger dataset, can you help how can I get the necessary output? PS I use Vertica

Upvotes: 0

Views: 553

Answers (1)

marcothesane
marcothesane

Reputation: 6749

It's Vertica. And Vertica has the TIMESERIES clause, which seems to exactly match with what you need:

Out of a time series - like you have one - with irregular intervals between the rows, or with longer gaps in an otherwise regular time series, it creates a regular time series, with the same interval between each row pair as you specify in the AS sub-clause of the TIMESERIES clause itself. TS_FIRST_VALUE() and TS_LAST_VALUE() are functions that rely on that clause and return the right value deduced from the input rows at the generated time stamp. This right value can be obtained 'const', that is from the row in the original row set closest to the generated time stamp, or 'linear', that is, interpolated from the original row just before and the original row just after the generated timestamp. For your needs, you would use the constant value. See here:

WITH
-- your input ....
input(tmstmp,grp,nbr) AS (
          SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
-- real query here ...
SELECT
  ts AS tmstmp
, grp
, TS_FIRST_VALUE(nbr,'const') AS nbr
FROM input
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY tmstmp)
ORDER BY 1,2
;
-- out        tmstmp        | grp |  nbr  
-- out ---------------------+-----+-------
-- out  2020-05-28 00:00:00 | A   | 55.00
-- out  2020-05-28 00:00:00 | B   |  1.09
-- out  2020-05-28 00:00:00 | C   |  1.80
-- out  2020-05-29 00:00:00 | A   | 68.00
-- out  2020-05-29 00:00:00 | B   |  1.90
-- out  2020-05-29 00:00:00 | C   |  1.19
-- out  2020-05-30 00:00:00 | A   | 68.00
-- out  2020-05-30 00:00:00 | B   |  1.90
-- out  2020-05-30 00:00:00 | C   |  1.19
-- out  2020-05-31 00:00:00 | A   | 68.00
-- out  2020-05-31 00:00:00 | B   |  1.90
-- out  2020-05-31 00:00:00 | C   |  1.19
-- out  2020-06-01 00:00:00 | A   | 10.00
-- out  2020-06-01 00:00:00 | B   | 15.00
-- out  2020-06-01 00:00:00 | C   |  0.88
-- out  2020-06-02 00:00:00 | A   | 22.00
-- out  2020-06-02 00:00:00 | B   | 15.00
-- out  2020-06-02 00:00:00 | C   | 13.00
-- out  2020-06-03 00:00:00 | A   | 66.00
-- out  2020-06-03 00:00:00 | B   | 88.00

Upvotes: 2

Related Questions