Reputation: 3389
Let's assume i have this MySQL database under the name records
. Table scheme would be as follows, where id
is an index key and url
is unique:
id BINGINT(20) UNSIGNED AUTO_INCREMENT
num_chars SMALLINT(4) UNSIGNED
url VARCHAR(1000) UNIQUE
This would be the table's data representation, basicaly:
-------------------------------------------
| id | num_chars | url |
-------------------------------------------
| 1 | 22 | https://www.google.com |
| 2 | 17 | https://yahoo.com |
| 3 | 16 | https://bing.com |
-------------------------------------------
num_chars
is the url's number of characters.
My question is, considering the fact that this table will probably hit several millions of records: is there a performance improvement of this query:
SELECT * FROM records WHERE num_chars = 17 AND url = 'https://yahoo.com';
Over this one:
SELECT * FROM records WHERE url = 'https://yahoo.com';
I know that integer based queries are more efficient than string based ones (correct me if i'm wrong), therefore i wonder if filtering by num_chars
before url
would represent a efficiency improvement.
By the way, the advantage in this case is that i can easily calculate num_chars
from url
before performing the MySQL query, using PHP, Java, Python, etc.
Upvotes: 0
Views: 74
Reputation: 415745
Is there a performance improvement?
The answer depends on two tings:
num_chars
column. If a lot of your data comes from a few different sources: things like url shorteners, amazon product links, etc — really any system where you have a relatively small number possible lengths — then adding that num_chars=17
condition is still going to match a lot of rows and not actually filter things down much. url
directly, with no other indexes, is likely to make that condition outperform the num_chars
condition regardless of selectivity. However, placing both num_chars
and url
into a single index, in that order, might be able to take good advantage of the additional field, even with poor selectivity.But remember: database vendors aren't stupid. They devote a lot of effort into finding ways to optimize queries. There's good odds the engine may already be doing this kind of thing behind the scenes. The best thing you could do is generate some sample data in a table and test it, to know what will really happen.
Finally, if you really want to do this, consider making it a Generated Column.
Upvotes: 0
Reputation: 1269773
You have a unique
index on url. So, both queries will use this index.
Adding an additional check on the length is not going to speed up the query. There will be a very, very, very small additional overhead for the length check, but that is immaterial.
When you have a unique index, there is no need to add additional checks.
Note: The advantage of an integer comparison over a string comparison arises when you don't need to do a string comparison. In this case, you need to do the string comparison.
There might be tiny gain if you hashed the string to an integer and compared that before comparing the string.
Upvotes: 1
Reputation: 108400
Without an appropriate index defined, both of those queries are going to suck.
It's not actually true that integer queries are more efficient than text based ones; we can demonstrate text based queries that are blazing fast, and integer queries that are glacial. (At least, its not true enough in this case to make any difference.)
What matters, what does make a difference for large sets is effective use of an available index.
With several millions of rows, we need to consider the distribution of the num_chars
values, for outliers, where there are only a couple dozen rows, and index search on num_chars
will be fast. But for larger sets, we still need to evaluate the url
to see if it matches.
I'd just create a covering index for the query:
CREATE UNIQUE INDEX mytable_ix1 ON mytable (url, num_chars, id) ;
Then run whichever query you want; we expect same execution plan, so performance will be the same.
Upvotes: 0