Reputation: 660
I'm using Vertica, which precludes me from using CROSS APPLY, unfortunately. And apparently there's no such thing as CTEs in Vertica.
Here's what I've got:
t:
day | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 | 1 | 10 | 10
2011-12-03 | 1 | 12 | 2
2011-12-04 | 1 | 15 | 3
Note that on the first day, the delta is equal to the metric value. I'd like to fill in the gaps, like this:
t_fill:
day | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 | 1 | 10 | 10
2011-12-02 | 1 | 10 | 0 -- a delta of 0
2011-12-03 | 1 | 12 | 2
2011-12-04 | 1 | 15 | 3
I've thought of a way to do this day by day, but what I'd really like is a solution that works in one go.
I think I could get something working with LAST_VALUE, but I can't come up with the right JOIN statements that will let me properly partition and order on each id's day-by-day history.
edit: assume I have a table like this:
calendar:
day
------------
2011-01-01
2011-01-02
...
that can be involved with joins. My intent would be to maintain the date range in calendar to match the date range in t.
edit: A few more notes on what I'm looking for, just to be specific:
In generating t_fill, I'd like to exactly cover the date range in t, as well as any dates that are missing in between. So a correct t_fill will start on the same date and end on the same date as t. t_fill has two properties:
1) once an id appears on some date, it will always have a row for each later date. This is the gap-filling implied in the original question.
2) Should no row for an id ever appear again after some date, the t_fill solution should merrily generate rows with the same metric value (and 0 delta) from the date of that last data point up to the end date of t.
A solution might backfill earlier dates up to the start of the date range in t. That is, for any id that appears after the first date in t, rows between the first date in t and the first date for the id will be filled with metric=0 and d_metric=0. I don't prefer this kind of solution, since it has a higher growth factor for each id that enters the system. But I could easily deal with it by selecting into a new table only rows where metric!=0 and d_metric!=0.
Upvotes: 2
Views: 1697
Reputation: 44250
This about what Jonathan Leffler proposed, but into old-fashioned low-level SQL (without fancy CTE's or window functions or aggregating subqueries):
SET search_path='tmp'
DROP TABLE ttable CASCADE;
CREATE TABLE ttable
( zday date NOT NULL
, id INTEGER NOT NULL
, metric INTEGER NOT NULL
, d_metric INTEGER NOT NULL
, PRIMARY KEY (id,zday)
);
INSERT INTO ttable(zday,id,metric,d_metric) VALUES
('2011-12-01',1,10,10)
,('2011-12-03',1,12,2)
,('2011-12-04',1,15,3)
;
DROP TABLE ctable CASCADE;
CREATE TABLE ctable
( zday date NOT NULL
, PRIMARY KEY (zday)
);
INSERT INTO ctable(zday) VALUES
('2011-12-01')
,('2011-12-02')
,('2011-12-03')
,('2011-12-04')
;
CREATE VIEW v_cte AS (
SELECT t.zday,t.id,t.metric,t.d_metric
FROM ttable t
JOIN ctable c ON c.zday = t.zday
UNION
SELECT c.zday,t.id,t.metric, 0
FROM ctable c, ttable t
WHERE t.zday < c.zday
AND NOT EXISTS ( SELECT *
FROM ttable nx
WHERE nx.id = t.id
AND nx.zday = c.zday
)
AND NOT EXISTS ( SELECT *
FROM ttable nx
WHERE nx.id = t.id
AND nx.zday < c.zday
AND nx.zday > t.zday
)
)
;
SELECT * FROM v_cte;
The results:
zday | id | metric | d_metric
------------+----+--------+----------
2011-12-01 | 1 | 10 | 10
2011-12-02 | 1 | 10 | 0
2011-12-03 | 1 | 12 | 2
2011-12-04 | 1 | 15 | 3
(4 rows)
Upvotes: 1
Reputation: 754570
Given the complete calendar table, it is doable, though not exactly trivial. Without the calendar table, it would be a lot harder.
Your query needs to be stated moderately precisely, which is usually half the battle in any issue with 'how to write the query'. I think you are looking for:
This gives you a complete list of dates with metrics.
You then need to self-join two copies of that list with dates one day apart to form the deltas.
Note that if some ID values don't appear at the start of the date range, they won't show up.
With that as guidance, you should be able get going, I believe.
Upvotes: 0
Reputation: 1580
If you want to use something like a CTE, how about using a temporary table? Essentially, a CTE is a view for a particular query.
Depending on your needs you can make the temporary table transaction or session-scoped.
I'm still curious to know why gap-filling with constant-interpolation wouldn't work here.
Upvotes: 0
Reputation: 397
I am not Vertica user, but if you do not want to use their native support for GAP fillings, here you can find a more generic SQL-only solution to do so.
Upvotes: 0