Sqlite Query Optimization (using Limit and Offset)

Following is the query that I use for getting a fixed number of records from a database with millions of records:-

select * from myTable LIMIT 100 OFFSET 0

What I observed is, if the offset is very high like say 90000, then it takes more time for the query to execute. Following is the time difference between 2 queries with different offsets:

select * from myTable LIMIT 100 OFFSET 0       //Execution Time is less than 1sec
select * from myTable LIMIT 100 OFFSET 95000   //Execution Time is almost 15secs

Can anyone suggest me how to optimize this query? I mean, the Query Execution Time should be same and fast for any number of records I wish to retrieve from any OFFSET.

Newly Added:- The actual scenario is that I have got a database having > than 1 million records. But since it's an embedded device, I just can't do "select * from myTable" and then fetch all the records from the query. My device crashes. Instead what I do is I keep fetching records batch by batch (batch size = 100 or 1000 records) as per the query mentioned above. But as i mentioned, it becomes slow as the offset increases. So, my ultimate aim is that I want to read all the records from the database. But since I can't fetch all the records in a single execution, I need some other efficient way to achieve this.

Upvotes: 21

Views: 13965

Answers (6)

Alejadro Xalabarder
Alejadro Xalabarder

Reputation: 1630

There are two situations for that offset, limit problem (e.g. myOffset, myLimit)

1- the offset can be based on a numeric, unique and consecutive field

if this field is "id" then any of these two filters can do the job

        ... WHERE id >= myOffset LIMIT myLimit
  or
        ... WHERE id >= myOffset AND id < myOffset + myLimit

For a table containing no explicit numeric unique field, rowid - which is created automatically by sqlite - is the perfect candidate for such id

But even in this case we might have gaps in the ids, for instance when records are deleted. And so the filters could return repeated records or a number of them less than myLimit

2- the offset cannot be based directly on any column of the table

In this case there is no other way than use OFFSET LIMIT and accept the cost of big offsets. Actually sqlite cannot optimize this because it has to go through all previous records to count them.

A solution for more or less static tables:

If the huge table in question is not changing much or we can afford the extra time of following query each time the table changes

  DROP TABLE tmpOffsetMap ;
  CREATE TABLE tmpOffsetMap AS SELECT rowid AS tableRowid FROM table ;
           

this would create a table with as many record as the huge table "table" but only with two columns: rowid and tableRowid

then the select over "table" for myOffset and myLimit would look like

  SELECT * FROM table WHERE rowid IN (SELECT tableRowid FROM tmpOffsetMap WHERE rowid >= myOffset AND rowid < myOffset + myLimit)
  

note that "rowid" are different columns and have different values in tables "table" and "tmpOffsetMap"

Upvotes: 0

Srdjan
Srdjan

Reputation: 108

select * from data where rowid = (select rowid from data limit 1 offset 999999);

Upvotes: 3

P-Gn
P-Gn

Reputation: 24641

As @user318750 said, if you know you have a contiguous index, you can simply use

select * from Table where index >= %start and index < %(start+size)

However, those cases are rare. If you don't want to rely on that assumption, use a sub-query, for example using rowid, which is always indexed,

select * from Table where rowid in (
  select rowid from Table limit %size offset %start)

This speeds things up especially if you have "fat" rows (e.g. that contain blobs).

If maintaining the record order is important (it usually isn't), you need to order the indices first:

select * from Table where rowid in (
  select rowid from Table order by rowid limit %size offset %start)

Upvotes: 11

With SQLite, you don't need to get all rows returned at once in a big fat array, you can get called back for every row. This way, you can process the results as they come in, which should address both your crashing and performance issues.

I guess you're not using C as you would already be using a callback, but this technique should be available in any other language.

Javascript example (from : https://www.npmjs.com/package/sqlite3 )

 db.each("SELECT rowid AS id, info FROM lorem", function(err, row) {
      console.log(row.id + ": " + row.info);
  });

Upvotes: 1

user318750
user318750

Reputation: 386

As JvdBerg said, indexes are not used in LIMIT/OFFSET. Simply adding 'ORDER BY indexed_field' will not help too.

To speed up pagination you should avoid LIMIT/OFFSET and use WHERE clause instead. For example, if your primary key field is named 'id' and has no gaps, than your code above can be rewritten like this:

SELECT * FROM myTable WHERE id>=0     AND id<100     //very fast!
SELECT * FROM myTable WHERE id>=95000 AND id<95100   //as fast as previous line!

Upvotes: 10

JvdBerg
JvdBerg

Reputation: 21856

By doing a query with a offset of 95000, all previous 95000 records are processed. You should make some index on the table, and use that for selecting records.

Upvotes: 7

Related Questions