Jerry Fernholz
Jerry Fernholz

Reputation: 695

Pagination Strategies for Complex (slow) Datasets

What are some of the strategies being used for pagination of data sets that involve complex queries? count(*) takes ~1.5 sec so we don't want to hit the DB for every page view. Currently there are ~45k rows returned by this query.

Here are some of the approaches I've considered:

Upvotes: 6

Views: 1114

Answers (5)

ntd
ntd

Reputation: 7434

MySQL has a specific mechanism to compute an approximated count of a result set without the LIMIT clause: FOUND_ROWS().

Upvotes: 2

memnoch_proxy
memnoch_proxy

Reputation: 1954

I've had to engineer a few pagination strategies using PHP and MySQL for a site that does over a million page views a day. I persued the strategy in stages:

Multi-column indexes I should have done this first before attempting a materialized view.

Generating a materialized view. I created a cron job that did a common denormalization of the document tables I was using. I would SELECT ... INTO OUTFILE ... and then create the new table, and rotate it in:

SELECT ... INTO OUTFILE '/tmp/ondeck.txt' FROM mytable ...;
CREATE TABLE ondeck_mytable LIKE mytable;
LOAD DATA INFILE '/tmp/ondeck.txt' INTO TABLE ondeck_mytable...;
DROP TABLE IF EXISTS dugout_mytable;
RENAME TABLE atbat_mytable TO dugout_mytable, ondeck_mytable TO atbat_mytable;

This kept the lock time on the write contended mytable down to a minimum and the pagination queries could hammer away on the atbat materialized view. I've simplified the above, leaving out the actual manipulation, which are unimportant.

Memcache I then created a wrapper about my database connection to cache these paginated results into memcache. This was a huge performance win. However, it was still not good enough.

Batch generation I wrote a PHP daemon and extracted the pagination logic into it. It would detect changes mytable and periodically regenerate the from the oldest changed record to the most recent record all the pages to the webserver's filesystem. With a bit of mod_rewrite, I could check to see if the page existed on disk, and serve it up. This also allowed me to take effective advantage of reverse proxying by letting Apache detect If-Modified-Since headers, and respond with 304 response codes. (Obviously, I removed any option of allowing users to select the number of results per page, an unimportant feature.)

Updated: RE count(*): When using MyISAM tables, COUNT didn't create a problem when I was able to reduce the amount of read-write contention on the table. If I were doing InnoDB, I would create a trigger that updated an adjacent table with the row count. That trigger would just +1 or -1 depending on INSERT or DELETE statements.

RE page-pickers (thumbwheels) When I moved to agressive query caching, thumb wheel queries were also cached, and when it came to batch generating the pages, I was using temporary tables--so computing the thumbwheel was no problem. A lot of thumbwheel calculation simplified because it became a predictable filesystem pattern that actually only needed the largest page numer. The smallest page number was always 1.

Windowed thumbweel The example you give above for a windowed thumbwheel (<< 4 [5] 6 >>) should be pretty easy to do without any queries at all so long as you know your maximum number of pages.

Upvotes: 4

metrobalderas
metrobalderas

Reputation: 5240

I'm by no means a MySQL expert, but perhaps giving up the COUNT(*) and going ahead with COUNT(id)?

Upvotes: 0

Quassnoi
Quassnoi

Reputation: 425261

MySQL is quite good in optimizing LIMIT queries.

That means it picks appropriate join buffer, filesort buffer etc just enough to satisfy LIMIT clause.

Also note that with 45k rows you probably don't need exact count. Approximate counts can be figured out using separate queries on the indexed fields. Say, this query:

SELECT  COUNT(*)
FROM    mytable
WHERE   col1 = :myvalue
        AND col2 = :othervalue

can be approximated by this one:

SELECT  COUNT(*) *
        (
        SELECT  COUNT(*)
        FROM    mytable
        ) / 1000
FROM    (
        SELECT  1
        FROM    mytable
        WHERE   col1 = :myvalue
                AND col2 = :othervalue
        LIMIT 1000
        )

, which is much more efficient in MyISAM.

If you give an example of your complex query, probably I can say something more definite on how to improve its pagination.

Upvotes: 1

gnud
gnud

Reputation: 78518

My suggestion is ask MySQL for 1 row more than you need in each query, and decide based on the number of rows in the result set whether or not to show the next page-link.

Upvotes: 4

Related Questions