ryoaska
ryoaska

Reputation: 327

Query to get records with closest timestamp values for unique combination of two columns

+-------+----------------------+----------+------------------+
| isbn  | book_container_id    | shelf_id |   update_time    |
+-------+----------------------+----------+------------------+
|   555 |                    6 | shelf100 | 11/15/2015 19:10 |
|   123 |                    1 | shelf1   | 11/28/2015 8:00  |
|   555 |                    4 | shelf5   | 11/28/2015 9:10  |
|   212 |                    2 | shelf2   | 11/29/2015 8:10  |
|   555 |                    6 | shelf9   | 11/30/2015 22:10 |
|   321 |                    8 | shelf7   | 11/30/2015 8:10  |
|   555 |                    4 | shelf33  | 12/1/2015 7:00   |
+-------+----------------------+----------+------------------+

Let's say I have a table (PostgreSQL) like the above called bookshelf_configuration. If I'm given an ISBN and a timestamp, I want to be able to find the closest (before only) records for each unique combination of isbn and book_container_id.

So if I'm looking at isbn '555', with a timestamp of '12/1/2015 7:00', I should get back:

+-------+----------------------+----------+------------------+
| isbn  | book_container_id    | shelf_id |   update_time    |
+-------+----------------------+----------+------------------+
|   555 |                    6 | shelf9   | 11/30/2015 22:10 |
|   555 |                    4 | shelf33  | 12/1/2015 7:00   |
+-------+----------------------+----------+------------------+

My knowledge of SQL is extremely basic. I've got a query that would work if I only had to factor in isbn, but I need some help understanding how to do this for the combination (isbn, book_container_id).

Upvotes: 1

Views: 1361

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656882

A typical use case for DISTINCT ON:

SELECT DISTINCT ON (book_container_id)
       isbn, book_container_id, shelf_id, update_time 
FROM   bookshelf_configuration
WHERE  isbn = 555
AND    update_time <= '2015-12-01 07:00'  -- ISO 8601 format
ORDER  BY book_container_id, update_time DESC;

Assuming update_time is defined NOT NULL, or you have to add NULLS LAST. Detailed explanation:

Depending on cardinalities and value frequencies there may be even faster query styles:

Either way, a multicolumn index on (isbn, book_container_id, update_time DESC) is the key to make this fast for tables of non-trivial size. Sort order should match the query (or be it's complete inversion). If you add NULLS LAST to the query, add it to the index as well.

Aside: It's better to use ISO 8601 format for all date / time constants, since that is unambiguous with any locale or datestyle setting. Related:

Upvotes: 2

JamieD77
JamieD77

Reputation: 13949

There is a thing called Row_Number that can help you here.

Select * 
From (
    Select *,
           row_number() OVER (partition by isbn, book_container_id order by update_time desc) rn
    From   bookshelf_configuration
    Where  isbn = 555 and update_time <= '12/1/2015 7:00'   
) q 
Where q.rn = 1

Upvotes: 2

Related Questions