Michael
Michael

Reputation: 447

Take the first row from every time window

I have 2 columns: one column is a time column and the other column is some sort of boolean type column:

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:33   0
2010-08-01 10:59:34   1
2010-08-01 10:59:36   0
2010-08-01 10:59:38   1
2010-08-01 10:59:41   1
2010-08-01 10:59:43   0
2010-08-01 10:59:45   1
2010-08-01 10:59:47   0
2010-08-01 10:59:53   1

I want to take the first row from every window of 10 seconds.

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:43   0

How do I do that ?

Upvotes: 0

Views: 231

Answers (2)

marcothesane
marcothesane

Reputation: 6749

Vertica at its best - although you do get a timeseries snapped to the next 10-second border to start out of it, thus returning different rows if you don't fix that.

If you actually do need the exact original timestamps, then add to the 10-second-snapped time slices the difference between the minimal timeseries timestamp to the next minimal actual timestamp - that would be 2 seconds in this specific case - check the tb and the ts Common Table Expressions below.

WITH
-- your input ...
input(gmt,val) AS (
          SELECT TIMESTAMP '2010-08-01 10:59:32',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:33',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:34',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:36',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:38',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:41',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:43',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:45',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:47',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:53',1
)
,
-- create the timeseries - I decide to snap it to exact 10-second time slices
-- use the Vertica TIME SLICE function to create the limits of the time series
tm(tm) AS (
            SELECT MIN(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
  UNION ALL SELECT MAX(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
)
,
-- use Vertica's TIMESERIES clause to actually create the time series
-- which will be snapped to 10-second borders
tb(tb) AS (
  SELECT tb 
  FROM tm
  TIMESERIES tb AS '10 SECONDS' OVER(ORDER BY tm)
)
,
 -- add the difference between timeseries timestamp and actual timestamp
ts(ts) AS (
  SELECT 
    tb +( (SELECT MIN(gmt) FROM INPUT) - (SELECT MIN(tb) FROM tb) )
  FROM tb
)
-- finally, use the "Event Series Join"
-- - That's the INTERPOLATE PREVIOUS VALUE predicate - 
-- to apply an outer join
SELECT
  gmt
, ts AS control_ts
, val
FROM input
LEFT
JOIN ts
  ON gmt INTERPOLATE PREVIOUS VALUE ts
WHERE gmt IS NOT NULL
-- Vertica's Analytic Limit Clause
LIMIT 1 OVER(PARTITION BY ts ORDER BY gmt)
;

Returns:

         gmt         |     control_ts      | val 
---------------------+---------------------+-----
 2010-08-01 10:59:32 | 2010-08-01 10:59:32 |   1
 2010-08-01 10:59:43 | 2010-08-01 10:59:42 |   0
 2010-08-01 10:59:53 | 2010-08-01 10:59:52 |   1

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270463

You can use row_number():

select t.*
from (select t.*,
             row_nubmer() over (partition by date_trunc('minute', gmt), floor(extract(seconds from gmt) / 6)
                                order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

You could also convert to a string:

select t.*
from (select t.*,
             row_nubmer() over (partition by left(to_char(gmt, 'YYYYMMDDHH24MMSS'), 13)
                                order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

Or using epoch:

select t.*
from (select t.*,
             row_nubmer() over (partition by floor(extract(epoch from gmt) / 10)                                   order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

Upvotes: 2

Related Questions