Scott
Scott

Reputation: 13

MySQL query slow when selecting VARCHAR

I have this table:

CREATE TABLE `search_engine_rankings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `keyword_id` int(11) DEFAULT NULL,
  `search_engine_id` int(11) DEFAULT NULL,
  `total_results` int(11) DEFAULT NULL,
  `rank` int(11) DEFAULT NULL,
  `url` varchar(255) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `indexed_at` date DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
  KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
  CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
  CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8 

It has about 250M rows in production.

When I do:

select id,
       rank 
  from search_engine_rankings 
 where keyword_id = 19 
   and search_engine_id = 11 
   and indexed_at = "2010-12-03";

...it runs very quickly.

When I add the url column (VARCHAR):

select id,
       rank,
       url 
  from search_engine_rankings 
 where keyword_id = 19 
   and search_engine_id = 11 
   and indexed_at = "2010-12-03";

...it runs very slowly.

Any ideas?

Upvotes: 1

Views: 2284

Answers (4)

tobixen
tobixen

Reputation: 4073

TLDR: I solved this by running optimize on the table.


I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.

A query roughly looking like this took around 3s:

select someint from mytable where id in (1234, 12345, 123456).

While a query roughly looking like this took <10ms:

select count(*) from mytable where id in (1234, 12345, 123456).

The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.

After optimize table, both queries takes <10ms.

Upvotes: 0

Giovani
Giovani

Reputation: 2547

I know it's an old post but I was experiencing the same situation and I didn't found an answer. This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.

Ok first of all, create a view from this query:

CREATE VIEW view_one AS 
  select id,rank 
  from search_engine_rankings 
  where keyword_id = 19000 
  and search_engine_id = 11 
  and indexed_at = "2010-12-03";

Second, same query but with an inner join:

select v.*, s.url 
from view_one AS v 
inner join search_engine_rankings s ON s.id=v.id;

Upvotes: 0

ontrack
ontrack

Reputation: 3043

Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.

Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at

But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at

I believe it will be able to fully use that index to read the appropriate part of your table.

Upvotes: 1

Tim
Tim

Reputation: 5421

The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.

 UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),

The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.

You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.

Upvotes: 1

Related Questions