Unibit
Unibit

Reputation: 29

MySQL row count

I have a very large table (~1 000 000 rows) and complicated query with unions, joins and where statements (user can select different ORDER BY columns and directions). I need to get a row count for pagination. If I run query without counting rows it completes very fast. How can I implement pagination in fastest way? I tried to use EXPLAIN SELECT and SHOW TABLE STATUS to get approximate row count, but it is very different from real row count. My query is like this one (simplyfied):

SELECT * FROM (
    (   
        SELECT * FROM table_1 
        LEFT JOIN `table_a` ON table_1.record_id = table_a.id 
        LEFT JOIN `table_b` ON table_a.id = table_b.record_id 
        WHERE table_1.a > 10 AND table_a.b < 500 AND table_b.c = 1 
        ORDER BY x ASC
        LIMIT 0, 10        
    )      
    UNION        
    (   
        SELECT * FROM table_2
        LEFT JOIN `table_a` ON table_2.record_id = table_a.id 
        LEFT JOIN `table_b` ON table_a.id = table_b.record_id 
        WHERE table_2.d < 10 AND table_a.e > 500 AND table_b.f = 1 
        ORDER BY x ASC
        LIMIT 0, 10                                 
    )                 
) tbl ORDER BY x ASC LIMIT 0, 10

Query result without limiting is about ~100 000 rows, how can I get this approximate count in fastest way? My production query example is like this one:

SELECT SQL_CALC_FOUND_ROWS * FROM (
    (   
        SELECT
          articles_log.id AS log_id, articles_log.source_table,
          articles_log.record_id AS id, articles_log.dat AS view_dat, 
          articles_log.lang AS view_lang, '1' AS view_count, '1' AS unique_view_count,
          articles_log.user_agent, articles_log.ref, articles_log.ip,
          articles_log.ses_id, articles_log.bot, articles_log.source_type, articles_log.link,   
          articles_log.user_country, articles_log.user_platform,
          articles_log.user_os, articles_log.user_browser,                                          
          `contents`.dat AS source_dat, `contents_trans`.header, `contents_trans`.custom_text 
        FROM articles_log 
        INNER JOIN `contents` ON articles_log.record_id = `contents`.id
                             AND articles_log.source_table = 'contents'  
        INNER JOIN `contents_trans` ON `contents`.id = `contents_trans`.record_id
                                   AND `contents_trans`.lang='lv' 
        WHERE articles_log.dat > 0
          AND articles_log.dat >= 1488319200
          AND articles_log.dat <= 1489355999
          AND articles_log.bot = '0'
          AND (articles_log.record_id NOT LIKE '%\_404' AND articles_log.record_id <> '404'
               OR articles_log.source_table <> 'contents') 
    )      
    UNION        
    (   
        SELECT
          articles_log.id AS log_id, articles_log.source_table,
          articles_log.record_id AS id, articles_log.dat AS view_dat, 
          articles_log.lang AS view_lang, '1' AS view_count, '1' AS unique_view_count,
          articles_log.user_agent, articles_log.ref, articles_log.ip,
          articles_log.ses_id, articles_log.bot,
          articles_log.source_type, articles_log.link,   
          articles_log.user_country, articles_log.user_platform,
          articles_log.user_os, articles_log.user_browser,                                          
        `news`.dat AS source_dat, `news_trans`.header, `news_trans`.custom_text 
        FROM articles_log 
        INNER JOIN `news` ON articles_log.record_id = `news`.id
                         AND articles_log.source_table = 'news'  
        INNER JOIN `news_trans` ON `news`.id = `news_trans`.record_id
                         AND `news_trans`.lang='lv' 
        WHERE articles_log.dat > 0 
          AND articles_log.dat >= 1488319200
          AND articles_log.dat <= 1489355999
          AND articles_log.bot = '0'
          AND (articles_log.record_id NOT LIKE '%\_404' AND articles_log.record_id <> '404'
               OR articles_log.source_table <> 'contents') 
    )      
) tbl ORDER BY view_dat ASC LIMIT 0, 10 

Many thanks!

Upvotes: 0

Views: 223

Answers (2)

Paul Spiegel
Paul Spiegel

Reputation: 31772

If you can use UNION ALL instead of UNION (which is a shortcut for UNION DISTINCT) - In other words - If you don't need to remove duplicates you can try to add the counts of the two subqueries:

SELECT 
    (   
        SELECT COUNT(*) FROM table_1 
        LEFT JOIN `table_a` ON table_1.record_id = table_a.id 
        LEFT JOIN `table_b` ON table_a.id = table_b.record_id 
        WHERE table_1.a > 10 AND table_a.b < 500 AND table_b.c = 1      
    )      
    +
    (   
        SELECT COUNT(*) FROM table_2
        LEFT JOIN `table_a` ON table_2.record_id = table_a.id 
        LEFT JOIN `table_b` ON table_a.id = table_b.record_id 
        WHERE table_2.d < 10 AND table_a.e > 500 AND table_b.f = 1                              
    ) 
    AS cnt

Without ORDER BY and without UNION the engine might not need to create a huge temp table.

Update

For your original query try the following:

  • Select only count(*).
  • Remove OR articles_log.source_table <> 'contents' from first part (contents) since we know it's never true.
  • Remove AND (articles_log.record_id NOT LIKE '%\_404' AND articles_log.record_id <> '404' OR articles_log.source_table <> 'contents') from second part (news) since we know it's allways true because OR articles_log.source_table <> 'contents' is allways true.
  • Remove the joins with contents and news. You can join the *_trans tables directly using record_id
  • Remove articles_log.dat > 0 since it's redundant with articles_log.dat >= 1488319200

The resulting query:

SELECT (   
    SELECT COUNT(*)
    FROM articles_log 
    INNER JOIN `contents_trans`
      ON `contents_trans`.record_id = articles_log.record_id
      AND `contents_trans`.lang='lv' 
    WHERE articles_log.bot = '0'
      AND articles_log.dat >= 1488319200
      AND articles_log.dat <= 1489355999
      AND articles_log.record_id NOT LIKE '%\_404'
      AND articles_log.record_id <> '404'
) + (   
    SELECT COUNT(*)
    FROM articles_log 
    INNER JOIN `news_trans`
      ON  `news_trans`.record_id = articles_log.record_id
      AND `news_trans`.lang='lv' 
    WHERE articles_log.bot = '0'
      AND articles_log.dat >= 1488319200
      AND articles_log.dat <= 1489355999
) AS cnt

Try the following index combinations:

articles_log(bot, dat, record_id)
contents_trans(lang, record_id)
news_trans(lang, record_id)

or

contents_trans(lang, record_id)
news_trans(lang, record_id)
articles_log(record_id, bot, dat)

It depends on the data, which combination ist the better one.

I might be wrong on one ore more points, since i don't know your data and business logic. If so, try to adjust the other.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269543

You can get the calculation when you run the query using SQL_CALC_FOUND_ROWS as explained in the documentation:

select SQL_CALC_FOUND_ROWS *
. . .

And then running:

select FOUND_ROWS()

However, the first run needs to generate all the data, so you are going to get up to 20 possible rows -- I don't think it respects LIMIT in subqueries.

Given the structure of your query and you want to do, I would think first about optimizing the query. For instance, is UNION really needed (it incurs overhead for removing duplicates)? As pointed out in a comment, your joins are really inner joins disguised as outer joins. Indexes might improve performance.

You might want to ask another question, providing sample data and desired results to get advice on such issues.

Upvotes: 0

Related Questions