Reputation: 1
The context is we are migrating a large amount of data from a single tenant to a multi-tenant, so the need for querying.
We want to fetch the data in batches so we can pass it to the destination server. The batch will be configurable by 1 or 2 lc. The query being executed is given below:
select *
from schema.table_name
order by value->>'datetime_param_in_stringformat' ASC
LIMIT 100000 OFFSET 0
Now this is being called from a golang app, which is running for 300mil records.
datetime_param_in_stringformat
- the mentioned parameter doesn't have an index on it.
Edit 1: The main reason to use order by is to keep track of the 'datetime_param' so that in any failures in transferring the data, we can retrigger the job and be able to transfer only delta one's. This is in relation to shortening the downtime of the app as well. And to get the data in chronological order, we can only refer to time.
I tried restoring the snapshot with a higher plan and more storage, but no luck. As I can see from pg_activity, the wait event is bufFileRead. Tried to get the batch size to 50k but same issue. Or else sometimes it takes more that 15min, which is not feasible.
Upvotes: 0
Views: 890
Reputation: 246043
Your only chance is an index like
CREATE INDEX ON tab ((value->>'datetime_param_in_stringformat'));
Don't use SELECT *
, select only the columns you really need. The index will be particularly helpful for a query with ORDER BY
and LIMIT
, but it will also speed up ORDER BY
without LIMIT
.
To allow concurrent data modifications while the index is being built, use CREATE INDEX CONCURRENTLY
.
Upvotes: 2