Reputation: 447
I have 2 columns: one column is a time column and the other column is some sort of boolean type column:
GMT VAL
2010-08-01 10:59:32 1
2010-08-01 10:59:33 0
2010-08-01 10:59:34 1
2010-08-01 10:59:36 0
2010-08-01 10:59:38 1
2010-08-01 10:59:41 1
2010-08-01 10:59:43 0
2010-08-01 10:59:45 1
2010-08-01 10:59:47 0
2010-08-01 10:59:53 1
I want to take the first row from every window of 10 seconds.
GMT VAL
2010-08-01 10:59:32 1
2010-08-01 10:59:43 0
How do I do that ?
Upvotes: 0
Views: 231
Reputation: 6749
Vertica at its best - although you do get a timeseries snapped to the next 10-second border to start out of it, thus returning different rows if you don't fix that.
If you actually do need the exact original timestamps, then add to the 10-second-snapped time slices the difference between the minimal timeseries timestamp to the next minimal actual timestamp - that would be 2 seconds in this specific case - check the tb
and the ts
Common Table Expressions below.
WITH
-- your input ...
input(gmt,val) AS (
SELECT TIMESTAMP '2010-08-01 10:59:32',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:33',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:34',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:36',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:38',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:41',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:43',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:45',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:47',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:53',1
)
,
-- create the timeseries - I decide to snap it to exact 10-second time slices
-- use the Vertica TIME SLICE function to create the limits of the time series
tm(tm) AS (
SELECT MIN(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
UNION ALL SELECT MAX(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
)
,
-- use Vertica's TIMESERIES clause to actually create the time series
-- which will be snapped to 10-second borders
tb(tb) AS (
SELECT tb
FROM tm
TIMESERIES tb AS '10 SECONDS' OVER(ORDER BY tm)
)
,
-- add the difference between timeseries timestamp and actual timestamp
ts(ts) AS (
SELECT
tb +( (SELECT MIN(gmt) FROM INPUT) - (SELECT MIN(tb) FROM tb) )
FROM tb
)
-- finally, use the "Event Series Join"
-- - That's the INTERPOLATE PREVIOUS VALUE predicate -
-- to apply an outer join
SELECT
gmt
, ts AS control_ts
, val
FROM input
LEFT
JOIN ts
ON gmt INTERPOLATE PREVIOUS VALUE ts
WHERE gmt IS NOT NULL
-- Vertica's Analytic Limit Clause
LIMIT 1 OVER(PARTITION BY ts ORDER BY gmt)
;
Returns:
gmt | control_ts | val
---------------------+---------------------+-----
2010-08-01 10:59:32 | 2010-08-01 10:59:32 | 1
2010-08-01 10:59:43 | 2010-08-01 10:59:42 | 0
2010-08-01 10:59:53 | 2010-08-01 10:59:52 | 1
Upvotes: 0
Reputation: 1270463
You can use row_number()
:
select t.*
from (select t.*,
row_nubmer() over (partition by date_trunc('minute', gmt), floor(extract(seconds from gmt) / 6)
order by gmt) as seqnum
from t
) t
where seqnum = 1;
You could also convert to a string:
select t.*
from (select t.*,
row_nubmer() over (partition by left(to_char(gmt, 'YYYYMMDDHH24MMSS'), 13)
order by gmt) as seqnum
from t
) t
where seqnum = 1;
Or using epoch
:
select t.*
from (select t.*,
row_nubmer() over (partition by floor(extract(epoch from gmt) / 10) order by gmt) as seqnum
from t
) t
where seqnum = 1;
Upvotes: 2