Reputation: 15
I have a sample table below:
+---------------------+---------+--------+
| timestamp | id | balance|
|---------------------+---------+--------|
| 2022-08-01 00:00:00 | 1 | 0.01 |
| 2022-08-01 00:00:00 | 2 | 0 |
| 2022-08-02 00:00:00 | 2 | 0.005 |
| 2022-08-02 07:00:00 | 2 | 0.5 |
| 2022-08-02 07:00:00 | 1 | 0.15 |
| 2022-08-03 00:00:00 | 1 | 0.02 |
| 2021-08-03 01:00:00 | 1 | 0.03 |
+---------------------+---------+--------+
And I want the following output:
+---------------------+---------+--------+
| timestamp | id | balance|
|---------------------+---------+--------|
| 2022-08-01 00:00:00 | 1 | 0.01 |
| 2022-08-01 00:00:00 | 2 | 0 |
| 2022-08-02 07:00:00 | 2 | 0.5 |
| 2022-08-02 07:00:00 | 1 | 0.15 |
| 2021-08-03 01:00:00 | 1 | 0.03 |
+---------------------+---------+--------+
I want to get the latest balance for each id, per day. Ideally the sql would use a window function as the source data table is very large (~900GB) and I need something efficient that will scale.
I've spent a lot of time trying to figure this out and keep getting stuck, any help would be greatly appreciated! Thanks in advance
Upvotes: 1
Views: 42
Reputation: 172994
Consider below simple approach
select * from your_table
qualify 1 = row_number() over id_day
window id_day as (partition by id, date(timestamp) order by timestamp desc)
Upvotes: 1