Reputation: 523
I have a PostgreSQL database with 1.2 billions rows, attempted to make an application that queries rows a million at a time, with an option to query larger intervals.
At first I was just querying a database of a million to 10 million with ease;
Now that I'm querying a large database with an OFFSET
the ResultSet
takes a long time to generate.
// ...
stmt.setFetchSize(100000);
ResultSet rs = stmt.executeQuery("SELECT mmsi, report_timestamp, position_geom, ST_X(position_geom) AS Long, "
+ "ST_Y(position_geom) AS Lat FROM reports4 WHERE position_geom IS NOT NULL ORDER by report_timestamp ASC LIMIT "
+ limit + " OFFSET " + set);
So the ORDER BY
is probably killing my execution time, but having the information ordered makes things easier later on. Is there a more efficient way to query the rows in intervals ?
Upvotes: 3
Views: 681
Reputation: 1406
You could use a partial index that is built over a subset of your database.
CREATE INDEX idx_reports4 ON reports4(position_geom, report_timestamp) where position_geom IS NOT NULL;
This would increase the performance considerably as you are just indexing a part of the database that is required.
Upvotes: 2
Reputation: 1269773
For this query:
SELECT mmsi, report_timestamp, position_geom, ST_X(position_geom) AS Long, "
+ "ST_Y(position_geom) AS Lat
FROM reports4
WHERE position_geom IS NOT NULL
ORDER by report_timestamp ASC;
You should be able to use an index on an expression:
CREATE INDEX idx_reports4_position_ts ON reports4((position_geom IS NOT NULL), report_timestamp)
This index should be used directly for the query.
Upvotes: 3