Reputation: 3806
Would an indexed compare of two string fields be faster if those fields were md5'd during the import, and the md5 field used in the compare instead of the source string field?
Upvotes: 1
Views: 922
Reputation: 116100
Probably not. It may depend on the type of index used. You can specify USING HASH
when creating the index. In this case, MySQL will use some kind of hashing (probably MD5 as well) to build the index with, but it will avoid the risk of collision. When two values have the same hash, the hash will just function as a range for both rows.
Upvotes: 0
Reputation: 60498
Possibly, if the strings were very long or very similar.
More importantly though, if the strings are very long, the index for the MD5 column would likely be smaller. Smaller means more likely to fit into memory, and faster to read from the disk. So if there were any performance improvement to be gained by doing this, most of it would probably come from that.
That said, you'd have to weigh any performance gain in reading with the extra cost of calculating the checksum and writing the value and the increased table size due to the new column being added.
Upvotes: 1
Reputation: 229058
If the source string is larger than the md5sum (32 bytes if you hex encode it), yes, using the md5sum would be faster.
While a low probability, you rish hash collision uising the md5 sum for equality though. i.e. 2 fields might comapare as equal while they're not, so think twice about that.
Upvotes: 0