How can I select the first and the last row for each set returned

I have the following data which I want to select as follows: How can I modify the query to select the output as shown below?

select primary_id, timestamp, secondary_id,... from tablename where 
timestamp <= to_timestamp('2020-07-29 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and 
timestamp <  to_timestamp('2020-07-29 04:00:00', 'YYYY-MM-DD HH24:MI:SS') 
order by timestamp, secondary_id;

primary_id timestamp    secondary_id  attribute1  attribute2  ... -- I want to get
-------------------------------------------------------------------
1          2020/01/20   10            ...         ...         ... -- <- this
2          2020/02/28   10            ...         ...         ...
3          2020/03/01   10            ...         ...         ... -- <- and this
4          2020/04/08   20            ...         ...         ... -- <- this
5          2020/05/31   20            ...         ...         ...
6          2020/06/30   20            ...         ...         ...
7          2020/06/31   20            ...         ...         ...
8          2020/07/31   20            ...         ...         ... -- <- and this

Upvotes: 0

Answers (4)

dnoeth

Reputation: 60482

This also works when the value of secondary_id can be repeated in another group of rows, it simply checks if the current id is different from the previous or next row:

select *
from (
    select 
        t.*, 
        lag(secondary_id) over(order by timestamp asc ) lag_id,
        lead(secondary_id) over(order by timestamp asc) lead_id
    from tablename t
    where timestamp <= timestamp '2020-07-29 00:00:00'
      and timestamp <  timestamp '2020-07-29 04:00:00'
) t
where lag_id is null 
   or lead_id is null
   or lag_id <> secondary_id
   or lead_id <> secondary_id
order by timestamp, secondary_id;

Should be quite efficient as there's the same ORDER BY for both LEAD & LAG.

Upvotes: 2

Giliam

Reputation: 620

You could use first_value and last_value. These are analytics functions and can be used like in the demo below.

with demo_data ( primary_id, secondary_id, timestamp)as
( select 1, 10, date '2020-01-01' from dual
  union all
  select 2 ,10, date '2020-01-28' from dual
  union all
  select 3, 10, date '2020-02-03' from dual
  union all
  select 4, 20, date '2020-03-02' from dual
  union all
  select 5, 20, date '2020-03-15' from dual
)
, grouped_data as
( select primary_id,
         secondary_id,
         timestamp,
         decode(first_value(primary_id) over(partition by secondary_id order by timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ), primary_id, 'Y', 'N') first_row_in_group,
         decode(last_value(primary_id) over(partition by secondary_id order by timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), primary_id, 'Y', 'N') last_row_in_group
  from   demo_data
)
select primary_id, secondary_id, timestamp
from   grouped_data s
where  first_row_in_group = 'Y' or last_row_in_group = 'Y'
/

Upvotes: 0

Jim Macaulay

Reputation: 5155

Please use below query,

select primary_id, timestamp, secondary_id,... from
(select primary_id, timestamp, secondary_id,..., 
row_number() over (partition by secondary_id  order by timestamp) as rnk1,
row_number() over (partition by secondary_id  order by timestamp desc) as rnk2
from tablename where 
timestamp <= to_timestamp('2020-07-29 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and 
timestamp <  to_timestamp('2020-07-29 04:00:00', 'YYYY-MM-DD HH24:MI:SS') ) qry
where rnk1=1 and rnk2 = 1
order by timestamp, secondary_id;

Upvotes: 0

GMB

Reputation: 222582

You can use window functions to rank records having the same secondary_id by ascending and descending timestamp, and then use that information to filter in the first and last record per group:

select primary_id, timestamp, secondary_id, ... 
from (
    select 
        t.*, 
        row_number() over(partition by secondary_id order by timestamp asc ) rn_asc,
        row_number() over(partition by secondary_id order by timestamp desc) rn_desc
    from tablename t
    where 
            timestamp <= timestamp '2020-07-29 00:00:00'
        and timestamp <  timestamp '2020-07-29 04:00:00'
) t
where 1 in (rn_asc, rn_desc)
order by timestamp, secondary_id;

Note that you don't need to_timestamp() to convert these literal strings: you can use literal dates instead.

Upvotes: 3

How can I select the first and the last row for each set returned

Answers (4)

Related Questions