Eduardo Escobar
Eduardo Escobar

Reputation: 3389

MySQL query performance improvement, WHERE integer condition before string condition

Let's assume i have this MySQL database under the name records. Table scheme would be as follows, where id is an index key and url is unique:

id BINGINT(20) UNSIGNED AUTO_INCREMENT
num_chars SMALLINT(4) UNSIGNED
url VARCHAR(1000) UNIQUE

This would be the table's data representation, basicaly:

-------------------------------------------
| id | num_chars |         url            |
-------------------------------------------
|  1 |    22     | https://www.google.com |
|  2 |    17     | https://yahoo.com      |
|  3 |    16     | https://bing.com       |
-------------------------------------------

num_chars is the url's number of characters.

My question is, considering the fact that this table will probably hit several millions of records: is there a performance improvement of this query:

SELECT * FROM records WHERE num_chars = 17 AND url = 'https://yahoo.com';

Over this one:

SELECT * FROM records WHERE url = 'https://yahoo.com';

I know that integer based queries are more efficient than string based ones (correct me if i'm wrong), therefore i wonder if filtering by num_chars before url would represent a efficiency improvement.

By the way, the advantage in this case is that i can easily calculate num_chars from url before performing the MySQL query, using PHP, Java, Python, etc.

Upvotes: 0

Views: 74

Answers (3)

Joel Coehoorn
Joel Coehoorn

Reputation: 415745

Is there a performance improvement?

The answer depends on two tings:

  1. The selectivity of the num_chars column. If a lot of your data comes from a few different sources: things like url shorteners, amazon product links, etc — really any system where you have a relatively small number possible lengths — then adding that num_chars=17 condition is still going to match a lot of rows and not actually filter things down much.
  2. The index choices made for the table. An index on url directly, with no other indexes, is likely to make that condition outperform the num_chars condition regardless of selectivity. However, placing both num_chars and url into a single index, in that order, might be able to take good advantage of the additional field, even with poor selectivity.

But remember: database vendors aren't stupid. They devote a lot of effort into finding ways to optimize queries. There's good odds the engine may already be doing this kind of thing behind the scenes. The best thing you could do is generate some sample data in a table and test it, to know what will really happen.

Finally, if you really want to do this, consider making it a Generated Column.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You have a unique index on url. So, both queries will use this index.

Adding an additional check on the length is not going to speed up the query. There will be a very, very, very small additional overhead for the length check, but that is immaterial.

When you have a unique index, there is no need to add additional checks.

Note: The advantage of an integer comparison over a string comparison arises when you don't need to do a string comparison. In this case, you need to do the string comparison.

There might be tiny gain if you hashed the string to an integer and compared that before comparing the string.

Upvotes: 1

spencer7593
spencer7593

Reputation: 108400

Without an appropriate index defined, both of those queries are going to suck.

It's not actually true that integer queries are more efficient than text based ones; we can demonstrate text based queries that are blazing fast, and integer queries that are glacial. (At least, its not true enough in this case to make any difference.)

What matters, what does make a difference for large sets is effective use of an available index.


With several millions of rows, we need to consider the distribution of the num_chars values, for outliers, where there are only a couple dozen rows, and index search on num_chars will be fast. But for larger sets, we still need to evaluate the url to see if it matches.


I'd just create a covering index for the query:

CREATE UNIQUE INDEX mytable_ix1 ON mytable (url, num_chars, id) ;

Then run whichever query you want; we expect same execution plan, so performance will be the same.

Upvotes: 0

Related Questions