Polly
Polly

Reputation: 1097

Add rows with date and fill empty cells with previous row value

I have a following table in Vertica:

Item_id  event_date  Price
   A     2019-01-01  100
   A     2019-01-04  200
   B     2019-01-05  150
   B     2019-01-06  250
   B     2019-01-09  350

As you see, there are some missing dates between 2019-01-04 and 2019-01-01, and also 2019-01-09 - 2019-01-06. What I need is to add for each item_id missing dates between the existing ones, and as the price cell will be NULL, fill it with the previous date Price. So it will be like this:

Item_id  event_date  Price
   A     2019-01-01  100
   A     2019-01-02  100
   A     2019-01-03  100
   A     2019-01-04  200
   B     2019-01-05  150
   B     2019-01-06  250
   B     2019-01-07  250
   B     2019-01-08  250
   B     2019-01-09  350

I tried to go with

SELECT Item_id, event_date
       CASE Price WHEN 0 THEN NVL( LAG( CASE Price WHEN 0 THEN NULL ELSE Price END ) IGNORE NULLS OVER ( ORDER BY NULL ), 0 ) ELSE Price END AS Price_new
FROM   item_price_table

from this article https://blog.jooq.org/2015/12/17/how-to-fill-sparse-data-with-the-previous-non-empty-value-in-sql/ , but it seems it works for SQL Server, but not for Vertica, as there are no IGNORE NULLS function... Does anyone know how to deal with it?

Upvotes: 1

Views: 1267

Answers (3)

marcothesane
marcothesane

Reputation: 6749

I was waiting for that one....!

I just love Vertica's TIMESERIES clause ! It works on TIMESTAMPs, not DATEs, so I have to cast back and forth, but it's unbeatable.

See here:

WITH                                                                                                                                                                                                        
input(item_id,event_dt,price) AS (
          SELECT 'A',DATE '2019-01-01',100
UNION ALL SELECT 'A',DATE '2019-01-04',200
UNION ALL SELECT 'B',DATE '2019-01-05',150
UNION ALL SELECT 'B',DATE '2019-01-06',250
UNION ALL SELECT 'B',DATE '2019-01-09',350
)
SELECT
  item_id
, event_dts::DATE AS event_dt
, TS_FIRST_VALUE(price) AS price
FROM input
TIMESERIES event_dts AS '1 DAY' OVER(PARTITION BY item_id ORDER BY event_dt::timestamp)
-- out  item_id |  event_dt  | price 
-- out ---------+------------+-------
-- out  A       | 2019-01-01 |   100
-- out  A       | 2019-01-02 |   100
-- out  A       | 2019-01-03 |   100
-- out  A       | 2019-01-04 |   200
-- out  B       | 2019-01-05 |   150
-- out  B       | 2019-01-06 |   250
-- out  B       | 2019-01-07 |   250
-- out  B       | 2019-01-08 |   250
-- out  B       | 2019-01-09 |   350
-- out (9 rows)
-- out 
-- out Time: First fetch (9 rows): 68.057 ms. All rows formatted: 68.221 ms
;

Need explanations?

Happy playing ...

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270421

Let me assume you have a calendar table. In Vertica, you can then use last_value(ignore nulls) to fill in the rest:

select c.event_date, i.item_id,
       coalesce(ipt.price,
                last_value(ipt.price ignore nulls) over (partition by i.item_id order by c.event_date)
               ) as price
from calendar c cross join
     (select distinct item_id from item_price_table) i left join
     item_price_table ipt
     on i.item_it = ipt.item_id and c.date = ipt.event_date

Upvotes: 2

Josh
Josh

Reputation: 1601

I don't have the code on the top of my head, but to add the missing dates you'll want to create a Calendar table and join to that. Then you can use the lag function to replace the null Price with the one above it. There's plenty of code if you search CTE to create calendar table.

Upvotes: 0

Related Questions