codeNinja
codeNinja

Reputation: 1462

adding where clause in query with index slows down substantially

I have a table with 640k records. There is a column called COUNTRY_CD. Its always set to USA. There is an index on the COUNTRY_CD column.

I ran the following query which took 6 sec

select sum(ga) from sales_data

Ran the following query which took 40 sec

Select sum(ga) from sales_data where country_cd='usa'

As i said that there is an index on country_cd. Why does it take so much time to run the second query?


Update:

i increased the innodb_buffer_pool_size. The first query now runs in 1 sec and the second in 6 sec. What more can i do. I really need them to be sub 1 second queries.

Upvotes: 1

Views: 1054

Answers (1)

dognose
dognose

Reputation: 20889

While your overall performance is pretty slow, a reason for the difference could be the way, mysql is using indexes.

If you are running the query without a where-clause, mysql does a full table scan to sum up values. Usually mysql is now reading from the datatable in the order of insert, leading to sequential disk i/o. (fast)

When you are using the where-clause on an indexed column, mysql will use that index. So, it will start to process the index sequentialy but for every match, it will do a lookup of the ga value in the actual datatable, which then leads to random disk i/o

Here is a nice example, how sequential disk access can be way faster than using indexes (and random disk access): https://www.percona.com/blog/2012/11/23/full-table-scan-vs-full-index-scan-performance/

The example shows about the same relation (6 times slower) when using a very bad index - which your index is, if every data-row contains the same value.

The link also outlines, that this only matters if the table does not fit into memory (maybe your dev machine has not given enough memory to mysql?)


I just read through the comments there, big discussion about if there is really sequential disk-io possible. I cannot tell this for sure, but at least the results and the difference compared to in-memory operations are showing that the access of the FULL-TABLE is faster, than accessing the table based on indexes, if it has not yet been loaded to the memory.

Upvotes: 3

Related Questions