Fetching rows not updated within last 24 hours

Question

I have a large table (40+ million records) with a structure like the following:

CREATE TABLE collected_data(
    id TEXT NOT NULL,
    status TEXT NOT NULL,
    PRIMARY KEY(id, status),
    blob JSONB,
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

I need to get all (or atleast 100,000) records that have a updated_at older than 24 hours, of a certain status, and have a blob that is not null.

So the query becomes:

SELECT
    id
FROM
    collected_data
WHERE
    status = 'waiting'
    AND blob IS NOT NULL
    AND updated_at < NOW() - '24 hours'::interval
LIMIT 100000;

Which results in the execution plan of something like:

Limit  (cost=0.00..234040.07 rows=100000 width=12)
  ->  Seq Scan on collected_data  (cost=0.00..59236150.00 rows=25310265 width=12)
"        Filter: ((blob IS NOT NULL) AND (type = 'waiting'::text) AND (updated_at >= (now() - '24:00:00'::interval)))"

It almost always results in a full table scan, which mean that some queries are really slow.

I have tried to create indexes like CREATE INDEX idx_special ON collected_data(status, updated_at); but it does not help.

Is there any way I can make this query faster?

Fetching rows not updated within last 24 hours

Answers (1)

Related Questions