user15373274
user15373274

Reputation:

How to join and keep only latest rows in SQL (Bigquery)

I'm using Bigquery. My dataset looks like the following. Basically, I started with an event table where a row is only added when data is updated. I exploded this table into the below by cross joining to a calendar table where my_date >= event_date. The problem I'm having now is when there are two event_dates per my_date (per tenancy_id), I only want to end up with the row for the latest one. (In this example, that means I want to DROP rows 4 & 12). What's the best way to achieve this?

Upvotes: 0

Views: 307

Answers (2)

You can use row_number() window function to get the latest row per my_date and tenancy_id.

 create table yourtable (Row int, my_date date, event_date date, tenancy_ld int, rent_period varchar(100), rent_amount int, tenancy_start_date date, tenancy_end_date date);

 insert into yourtable values(1,    '6/29/2021',    '6/29/2021', 1109,  'FORTN IGHTLY', 67794   ,'03/15/21','8/28/2021');
 insert into yourtable values(2,    '6/29/2021',    '6/29/2021', 1110,  'MONTHLY',      224364  ,'02/14/21','3/14/2022');
 insert into yourtable values(3,    '6/28/2021',    '6/28/2021', 1111,  'FORTN IGHTLY', 70000   ,'02/27/21','4/20/2022');
 insert into yourtable values(4,    '6/29/2021',    '6/28/2021', 1111,  'FORTN IGHTLY', 70000   ,'02/27/21','4/20/2022');
 insert into yourtable values(5,    '6/29/2021',    '6/29/2021', 1111,  'FORTN IGHTLY', 77056   ,'02/27/21','4/20/2022');
 insert into yourtable values(6,    '6/29/2021',    '6/29/2021', 1112,  'WEEKLY',       32830   ,'10/17/20','12/22/2021');
 insert into yourtable values(7,    '6/29/2021',    '6/29/2021', 1113,  'WEEKLY',       73620   ,'07/17/20','2/3/2022');
 insert into yourtable values(8,    '6/25/2021',    '6/25/2021', 1114,  'WEEKLY',       136     ,'09/19/20','4/26/2022');
 insert into yourtable values(9,    '6/26/2021',    '6/25/2021', 1114,  'WEEKLY',       136     ,'09/19/20','4/26/2022');
 insert into yourtable values(10,'6/27/2021',   '6/25/2021', 1114,  'WEEKLY',       136     ,'09/19/20','4/26/2022');
 insert into yourtable values(11,'6/28/2021',   '6/25/2021', 1114,  'WEEKLY',       136     ,'09/19/20','4/26/2022');
 insert into yourtable values(12,'6/29/2021',   '6/25/2021', 1114,  'WEEKLY',       136     ,'09/19/20','4/26/2022');
 insert into yourtable values(13,'6/29/2021',   '6/29/2021', 1114,  'FORTN IGHTLY', 136552  ,'09/19/20','4/26/2022');

Query:

 with cte as
 (
    select my_date, event_date, tenancy_ld, rent_period, rent_amount, 
    tenancy_start_date, tenancy_end_date, row_number()over(partition by tenancy_ld, 
    my_date order by event_date desc) rnk
    from yourtable
 )
 select my_date, event_date, tenancy_ld, rent_period, rent_amount, 
    tenancy_start_date, tenancy_end_date from cte where rnk=1

Output:

my_date event_date tenancy_ld rent_period rent_amount tenancy_start_date tenancy_end_date
2021-06-29 2021-06-29 1109 FORTN IGHTLY 67794 2021-03-15 2021-08-28
2021-06-29 2021-06-29 1110 MONTHLY 224364 2021-02-14 2022-03-14
2021-06-28 2021-06-28 1111 FORTN IGHTLY 70000 2021-02-27 2022-04-20
2021-06-29 2021-06-29 1111 FORTN IGHTLY 77056 2021-02-27 2022-04-20
2021-06-29 2021-06-29 1112 WEEKLY 32830 2020-10-17 2021-12-22
2021-06-29 2021-06-29 1113 WEEKLY 73620 2020-07-17 2022-02-03
2021-06-25 2021-06-25 1114 WEEKLY 136 2020-09-19 2022-04-26
2021-06-26 2021-06-25 1114 WEEKLY 136 2020-09-19 2022-04-26
2021-06-27 2021-06-25 1114 WEEKLY 136 2020-09-19 2022-04-26
2021-06-28 2021-06-25 1114 WEEKLY 136 2020-09-19 2022-04-26
2021-06-29 2021-06-29 1114 FORTN IGHTLY 136552 2020-09-19 2022-04-26

db<>fiddle here

(It's implemented on SQL Server but should work with BigQuery)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269443

I would recommend using qualify:

select t.*
from t
where 1=1
qualify row_number() over (partition by tenancy_id, event_date
                           order by my_date desc
                          ) = 1;

Upvotes: 0

Related Questions