Reputation:
I'm using Bigquery. My dataset looks like the following. Basically, I started with an event table where a row is only added when data is updated. I exploded this table into the below by cross joining to a calendar table where my_date >= event_date. The problem I'm having now is when there are two event_dates per my_date (per tenancy_id), I only want to end up with the row for the latest one. (In this example, that means I want to DROP rows 4 & 12). What's the best way to achieve this?
Upvotes: 0
Views: 307
Reputation: 15893
You can use row_number() window function to get the latest row per my_date and tenancy_id.
create table yourtable (Row int, my_date date, event_date date, tenancy_ld int, rent_period varchar(100), rent_amount int, tenancy_start_date date, tenancy_end_date date);
insert into yourtable values(1, '6/29/2021', '6/29/2021', 1109, 'FORTN IGHTLY', 67794 ,'03/15/21','8/28/2021');
insert into yourtable values(2, '6/29/2021', '6/29/2021', 1110, 'MONTHLY', 224364 ,'02/14/21','3/14/2022');
insert into yourtable values(3, '6/28/2021', '6/28/2021', 1111, 'FORTN IGHTLY', 70000 ,'02/27/21','4/20/2022');
insert into yourtable values(4, '6/29/2021', '6/28/2021', 1111, 'FORTN IGHTLY', 70000 ,'02/27/21','4/20/2022');
insert into yourtable values(5, '6/29/2021', '6/29/2021', 1111, 'FORTN IGHTLY', 77056 ,'02/27/21','4/20/2022');
insert into yourtable values(6, '6/29/2021', '6/29/2021', 1112, 'WEEKLY', 32830 ,'10/17/20','12/22/2021');
insert into yourtable values(7, '6/29/2021', '6/29/2021', 1113, 'WEEKLY', 73620 ,'07/17/20','2/3/2022');
insert into yourtable values(8, '6/25/2021', '6/25/2021', 1114, 'WEEKLY', 136 ,'09/19/20','4/26/2022');
insert into yourtable values(9, '6/26/2021', '6/25/2021', 1114, 'WEEKLY', 136 ,'09/19/20','4/26/2022');
insert into yourtable values(10,'6/27/2021', '6/25/2021', 1114, 'WEEKLY', 136 ,'09/19/20','4/26/2022');
insert into yourtable values(11,'6/28/2021', '6/25/2021', 1114, 'WEEKLY', 136 ,'09/19/20','4/26/2022');
insert into yourtable values(12,'6/29/2021', '6/25/2021', 1114, 'WEEKLY', 136 ,'09/19/20','4/26/2022');
insert into yourtable values(13,'6/29/2021', '6/29/2021', 1114, 'FORTN IGHTLY', 136552 ,'09/19/20','4/26/2022');
Query:
with cte as
(
select my_date, event_date, tenancy_ld, rent_period, rent_amount,
tenancy_start_date, tenancy_end_date, row_number()over(partition by tenancy_ld,
my_date order by event_date desc) rnk
from yourtable
)
select my_date, event_date, tenancy_ld, rent_period, rent_amount,
tenancy_start_date, tenancy_end_date from cte where rnk=1
Output:
my_date | event_date | tenancy_ld | rent_period | rent_amount | tenancy_start_date | tenancy_end_date |
---|---|---|---|---|---|---|
2021-06-29 | 2021-06-29 | 1109 | FORTN IGHTLY | 67794 | 2021-03-15 | 2021-08-28 |
2021-06-29 | 2021-06-29 | 1110 | MONTHLY | 224364 | 2021-02-14 | 2022-03-14 |
2021-06-28 | 2021-06-28 | 1111 | FORTN IGHTLY | 70000 | 2021-02-27 | 2022-04-20 |
2021-06-29 | 2021-06-29 | 1111 | FORTN IGHTLY | 77056 | 2021-02-27 | 2022-04-20 |
2021-06-29 | 2021-06-29 | 1112 | WEEKLY | 32830 | 2020-10-17 | 2021-12-22 |
2021-06-29 | 2021-06-29 | 1113 | WEEKLY | 73620 | 2020-07-17 | 2022-02-03 |
2021-06-25 | 2021-06-25 | 1114 | WEEKLY | 136 | 2020-09-19 | 2022-04-26 |
2021-06-26 | 2021-06-25 | 1114 | WEEKLY | 136 | 2020-09-19 | 2022-04-26 |
2021-06-27 | 2021-06-25 | 1114 | WEEKLY | 136 | 2020-09-19 | 2022-04-26 |
2021-06-28 | 2021-06-25 | 1114 | WEEKLY | 136 | 2020-09-19 | 2022-04-26 |
2021-06-29 | 2021-06-29 | 1114 | FORTN IGHTLY | 136552 | 2020-09-19 | 2022-04-26 |
db<>fiddle here
(It's implemented on SQL Server but should work with BigQuery)
Upvotes: 1
Reputation: 1269443
I would recommend using qualify
:
select t.*
from t
where 1=1
qualify row_number() over (partition by tenancy_id, event_date
order by my_date desc
) = 1;
Upvotes: 0