user3767671
user3767671

Reputation: 51

SQL query: Speed up for huge tables

We have a table with about 25,000,000 rows called 'events' having the following schema:

TABLE events
- campaign_id   : int(10)
- city      : varchar(60)
- country_code  : varchar(2)

The following query takes VERY long (> 2000 seconds):

SELECT COUNT(*) AS counted_events, country_code
FROM events
WHERE campaign_id` in (597) 
GROUPY BY city, country_code
ORDER BY counted_events

We found out that it's because of the GROUP BY part.

There is already an index idx_campaign_id_city_country_code on (campaign_id, city, country_code) which is used.

Maybe someone can suggest a good solution to speed it up?

Update:

'Explain' shows that out of many possible index MySql uses this one: 'idx_campaign_id_city_country_code', for rows it shows: '471304' and for 'Extra' it shows: 'Using where; Using temporary; Using filesort' –

Here is the whole result of EXPLAIN:

UPDATE:

Ok, I think it has been solved:

Looking at the pasted query here again I realized that I forget to mention here that there was one more column in the SELECT called 'country_name'. So the query was very slow then (including country_name), but I'll just leave it out and now the performance of the query is absolutely ok. Sorry for that mistake!

So thank you for all your helpful comments, I'll upvote all the good answers! There were some really helpful additions, that I probably also we apply (like changing types etc).

Upvotes: 5

Views: 1582

Answers (4)

XL_
XL_

Reputation: 699

  • partitioning - especially by country will not help
  • column IN (const-list) is not slow, it is in fact a case with special optimization

The problem is, that MySQL doesn't use the index for sorting. I cannot say why, because it should. Could be a bug.

The best strategy to execute this query is to scan that sub-tree of the index where event_id=597. Since the index is then sorted by city_id, country_code no extra sorting is needed and rows can be counted while scanning.

So the indexes are already optimal for this query. MySQL is just not using them correctly.


I'm getting more information off line. It seems this is not a database problem at all, but

  1. the schema is not normalized. The table contains not only country_code, but also country_name (this should be in an extra table).
  2. the real query contains country_name in the select list. But since that column is not indexed, MySQL cannot use an index scan.

As soon as country_name is dropped from the select list, the query reverts to an index-only scan ("using index" in EXPLAIN output) and is blazingly fast.

Upvotes: 0

borjab
borjab

Reputation: 11655

Some ideas:

  • Given the nature and size of the table it would be a great candidate for partitioned tables by country. This way the events of every country would be stored in a different physical table even if it behaves as a virtual big table

  • Is country code an string? May be you have a country_id that could be easier to sort. (It may force you to create or change indexes)

  • Are you really using the city in the group by?

Upvotes: 0

prcvcc
prcvcc

Reputation: 2230

without seeing what EXPLAIN says it's a long distance shot, anyway:

  1. make an index on (city,country_code)
  2. see if there's a way to use partitioning, your table is getting rather huge
  3. if country code is always 2 chars change it to char
  4. change numeric indexes to unsigned int

post entire EXPLAIN output

Upvotes: 3

low_rents
low_rents

Reputation: 4481

don't use IN() - better use:

WHERE campaign_id = 597
OR campaign_id = 231
OR ....

afaik IN() is very slow.

update: like nik0lias commented - IN() is faster than concatenating OR conditions.

Upvotes: 0

Related Questions