Reputation: 115
I have the following data which I want to select as follows: How can I modify the query to select the output as shown below?
select primary_id, timestamp, secondary_id,... from tablename where
timestamp <= to_timestamp('2020-07-29 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and
timestamp < to_timestamp('2020-07-29 04:00:00', 'YYYY-MM-DD HH24:MI:SS')
order by timestamp, secondary_id;
primary_id timestamp secondary_id attribute1 attribute2 ... -- I want to get
-------------------------------------------------------------------
1 2020/01/20 10 ... ... ... -- <- this
2 2020/02/28 10 ... ... ...
3 2020/03/01 10 ... ... ... -- <- and this
4 2020/04/08 20 ... ... ... -- <- this
5 2020/05/31 20 ... ... ...
6 2020/06/30 20 ... ... ...
7 2020/06/31 20 ... ... ...
8 2020/07/31 20 ... ... ... -- <- and this
Upvotes: 0
Views: 89
Reputation: 60482
This also works when the value of secondary_id can be repeated in another group of rows, it simply checks if the current id is different from the previous or next row:
select *
from (
select
t.*,
lag(secondary_id) over(order by timestamp asc ) lag_id,
lead(secondary_id) over(order by timestamp asc) lead_id
from tablename t
where timestamp <= timestamp '2020-07-29 00:00:00'
and timestamp < timestamp '2020-07-29 04:00:00'
) t
where lag_id is null
or lead_id is null
or lag_id <> secondary_id
or lead_id <> secondary_id
order by timestamp, secondary_id;
Should be quite efficient as there's the same ORDER BY for both LEAD & LAG.
Upvotes: 2
Reputation: 620
You could use first_value and last_value. These are analytics functions and can be used like in the demo below.
with demo_data ( primary_id, secondary_id, timestamp)as
( select 1, 10, date '2020-01-01' from dual
union all
select 2 ,10, date '2020-01-28' from dual
union all
select 3, 10, date '2020-02-03' from dual
union all
select 4, 20, date '2020-03-02' from dual
union all
select 5, 20, date '2020-03-15' from dual
)
, grouped_data as
( select primary_id,
secondary_id,
timestamp,
decode(first_value(primary_id) over(partition by secondary_id order by timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ), primary_id, 'Y', 'N') first_row_in_group,
decode(last_value(primary_id) over(partition by secondary_id order by timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), primary_id, 'Y', 'N') last_row_in_group
from demo_data
)
select primary_id, secondary_id, timestamp
from grouped_data s
where first_row_in_group = 'Y' or last_row_in_group = 'Y'
/
Upvotes: 0
Reputation: 5155
Please use below query,
select primary_id, timestamp, secondary_id,... from
(select primary_id, timestamp, secondary_id,...,
row_number() over (partition by secondary_id order by timestamp) as rnk1,
row_number() over (partition by secondary_id order by timestamp desc) as rnk2
from tablename where
timestamp <= to_timestamp('2020-07-29 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and
timestamp < to_timestamp('2020-07-29 04:00:00', 'YYYY-MM-DD HH24:MI:SS') ) qry
where rnk1=1 and rnk2 = 1
order by timestamp, secondary_id;
Upvotes: 0
Reputation: 222582
You can use window functions to rank records having the same secondary_id
by ascending and descending timestamp
, and then use that information to filter in the first and last record per group:
select primary_id, timestamp, secondary_id, ...
from (
select
t.*,
row_number() over(partition by secondary_id order by timestamp asc ) rn_asc,
row_number() over(partition by secondary_id order by timestamp desc) rn_desc
from tablename t
where
timestamp <= timestamp '2020-07-29 00:00:00'
and timestamp < timestamp '2020-07-29 04:00:00'
) t
where 1 in (rn_asc, rn_desc)
order by timestamp, secondary_id;
Note that you don't need to_timestamp()
to convert these literal strings: you can use literal dates instead.
Upvotes: 3