TreeWater
TreeWater

Reputation: 867

Best way to structure timescale table to get most recent value for multiple keys

I'm brand new to timescale and can do this in SQL fine but I'm not sure if there is a better way to do this in a timescaledb.

I have the following table

| DATE                | MERCHANT_ID | COUNTRY_CODE | PRODUCT_ID | INVENTORY |
|---------------------|-------------|--------------|------------|-----------|
| 2024-03-20 08:00:00 | 1           | US           | 1          | 150       |
| 2024-03-20 08:00:00 | 1           | US           | 2          | 220       |
| 2024-03-20 08:00:00 | 1           | US           | 3          | 380       |
| 2024-03-20 08:00:00 | 1           | US           | 4          | 400       |
| 2024-03-20 09:00:00 | 1           | US           | 1          | 180       |
| 2024-03-20 09:00:00 | 1           | US           | 2          | 200       |
| 2024-03-20 09:00:00 | 1           | US           | 3          | 310       |
| 2024-03-20 09:00:00 | 1           | US           | 4          | 440       |
| 2024-03-20 10:00:00 | 1           | US           | 1          | 120       |
| 2024-03-20 10:00:00 | 1           | US           | 2          | 240       |
| 2024-03-20 10:00:00 | 1           | US           | 3          | 360       |
| 2024-03-20 10:00:00 | 1           | US           | 4          | 410       |
| 2024-03-20 11:00:00 | 1           | US           | 1          | 130       |
| 2024-03-20 11:00:00 | 1           | US           | 2          | 260       |
| 2024-03-20 11:00:00 | 1           | US           | 3          | 390       |
| 2024-03-20 11:00:00 | 1           | US           | 4          | 520       |
| 2024-03-20 08:00:00 | 2           | UK           | 1          | 150       |
| 2024-03-20 08:00:00 | 2           | UK           | 2          | 250       |
| 2024-03-20 08:00:00 | 2           | UK           | 3          | 350       |
| 2024-03-20 08:00:00 | 2           | UK           | 4          | 450       |
| 2024-03-20 09:00:00 | 2           | UK           | 1          | 160       |
| 2024-03-20 09:00:00 | 2           | UK           | 2          | 270       |
| 2024-03-20 09:00:00 | 2           | UK           | 3          | 380       |
| 2024-03-20 09:00:00 | 2           | UK           | 4          | 490       |
| 2024-03-20 10:00:00 | 2           | UK           | 1          | 170       |
| 2024-03-20 10:00:00 | 2           | UK           | 2          | 280       |
| 2024-03-20 10:00:00 | 2           | UK           | 3          | 390       |
| 2024-03-20 10:00:00 | 2           | UK           | 4          | 500       |
| 2024-03-20 11:00:00 | 2           | UK           | 1          | 180       |
| 2024-03-20 11:00:00 | 2           | UK           | 2          | 290       |
| 2024-03-20 11:00:00 | 2           | UK           | 3          | 400       |
| 2024-03-20 11:00:00 | 2           | UK           | 4          | 510       |

What I'm trying to do is select the most recent inventory number for a key MERCHANT_ID, COUNTRY_CODE, PRODUCT_ID where PRODUCT_ID is some list. It could be the case where we don't have an inventory record for an undetermined amount of time and this table will likely grow to be in the scale of multi million rows.

So for example from the table above, if I am looking at `MERCHANT_ID=2, COUNTRY_CODE=US, PRODUCT_ID in (1, 2, 3), I would want:

| MERCHANT_ID | COUNTRY_CODE | PRODUCT_ID | INVENTORY |
|-------------|--------------|------------|-----------|
| 2           | US           | 1          | 130       |
| 2           | US           | 2          | 260       |
| 2           | US           | 3          | 390       |

What would the most performant way to get this data? I'm working with a blank slate currently so I only have an index on the date column so I can add indexes or create continuous aggregate materialized views as well if that is better. I will eventually be creating a timeseries for this table but currently I only want to get the most recent value.

Upvotes: 0

Views: 80

Answers (1)

jonatasdp
jonatasdp

Reputation: 1412

First and simplest attempt:

Add a composed index by time, MERCHANT_ID, COUNTRY_CODE, PRODUCT_ID.

If it's not enough, you can build a very lean hypertable which can be time, identifier,value

The value in like inventory for your case and id is a composition of other data.

I covered several models in this video if you want learn more: https://www.youtube.com/watch?v=sUgXZwK_sz0

Upvotes: 0

Related Questions