Nick Woolmer
Nick Woolmer

Reputation: 141

Why does QuestDB still touch all my table partitions when I use a timestamp filter?

I have a QuestDB table like this:

CREATE TABLE 'trades' (
  symbol SYMBOL,
  side SYMBOL,
  price DOUBLE,
  amount DOUBLE,
  timestamp TIMESTAMP,
  ingestion_time TIMESTAMP
) timestamp (timestamp) PARTITION BY DAY WAL;

with several hundred million rows. The rows are split over years of time. I try to do a simple query like this:

SELECT ingestion_time, count() from trades
WHERE ingestion_time BETWEEN '2024-01' AND '2024-02'
SAMPLE BY 1d

But the query is very slow. When I look at the logs, it seems to be opening lots of partitions that aren't in this time range. I thought that by giving a time range, it will only read data in that range and not everything?

How can I fix this?

Upvotes: 0

Views: 24

Answers (2)

Ayan Khan
Ayan Khan

Reputation: 1

Broad Filtering: Improper timestamp filtering does not restrict the search to the partitions.

Plan of Execution: A poorly structured query is probably the reason for the redundant partition scans.

Indexing Problems: If you don't index the timestamp in the right manner, you can slow down the process of data retrieval and can become a bottleneck.

Upvotes: 0

Nick Woolmer
Nick Woolmer

Reputation: 141

QuestDB has the concept of a designated timestamp, which corresponds to the ordering of the table. If you query using this timestamp, then QuestDB will use an interval scan and avoid touching partitions not within this range. If you run your query with EXPLAIN, you can see this inefficient filtering:

original query plan

In this plan, the query uses Row forward scan, which is a full table scan. Then it filters this for ingestion_time.

This is because, you have set the designated timestamp to be timestamp, not ingestion_time, when you created the table using the timestamp (timestamp) clause. You should instead use the timestamp column for time filtering, or re-create the table using ingestion_time as the ordering timestamp.

If you run the following query with EXPLAIN instead:

SELECT ingestion_time, count() from trades
WHERE timestamp BETWEEN '2024-01' AND '2024-02'
SAMPLE BY 1d

revised query plan

You can see that the filter has been removed (filter: null) and instead the database uses an Interval forward scan over the specified time range. This will execute much faster and only touch data the that is required.

Upvotes: 0

Related Questions