Reputation: 25282
I have MySQL InnoDb table where I want to store long (limit is 20k symbols) strings. Is there any way to create index for this field?
Upvotes: 8
Views: 9678
Reputation: 2793
Use the sha2 function with a specific length. Add this to your table:
`hash` varbinary(32) GENERATED ALWAYS AS (unhex(sha2(`your_text`,256)))
ADD UNIQUE KEY `ix_hash` (`hash`);
In 2011, Internet Engineering Task Force (IETF) published RFC 6151, describing possible attacks on MD5. Some attacks could generate collisions in less than a minute on an average computer.
Upvotes: 1
Reputation: 1916
you can put an MD5 of the field into another field and index that. then when u do a search, u match versus the full field that is not indexed and the md5 field that is indexed.
SELECT *
FROM large_field = "hello world hello world ..."
AND large_field_md5 = md5("hello world hello world ...")
large_field_md5 is index and so we go directly to the record that matches. Once in a blue moon it might need to test 2 records if there is a duplicate md5.
Upvotes: 11
Reputation: 1037
Here is an example how you could do that. As @Gidon Wise mentioned in his answer you can index the additional field. In this case it will be query_md5
.
CREATE TABLE `searches` (
`id` int(10) UNSIGNED NOT NULL,
`query` varchar(10000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`query_md5` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
) ENGINE=InnoDB;
ALTER TABLE `searches`
ADD PRIMARY KEY (`id`),
ADD KEY `searches_query_md5_index` (`query_md5`);
To make sure you will not have any similar md5 hashes you want to double check by doing and `query` =''
.
The query will look like this:
select * from `searches` where `query_md5` = "b6d31dc40a78c646af40b82af6166676" and `query` = 'long string ...'
b6d31dc40a78c646af40b82af6166676
is md5 hash of the long string ...
string. This, I think can improve query performance and you can be sure that you will get right results.
Upvotes: 4
Reputation: 234795
You will need to limit the length of the index, otherwise you are likely to get error 1071 ("Specified key was too long"). The MySQL manual entry on CREATE INDEX describes this:
Indexes can be created that use only the leading part of column values, using col_name(length) syntax to specify an index prefix length:
Prefixes can be specified for CHAR, VARCHAR, BINARY, and VARBINARY columns.
BLOB and TEXT columns also can be indexed, but a prefix length must be given.
Prefix lengths are given in characters for nonbinary string types and in bytes for binary string types. That is, index entries consist of the first length characters of each column value for CHAR, VARCHAR, and TEXT columns, and the first length bytes of each column value for BINARY, VARBINARY, and BLOB columns.
It also adds this:
Prefix support and lengths of prefixes (where supported) are storage engine dependent. For example, a prefix can be up to 1000 bytes long for MyISAM tables, and 767 bytes for InnoDB tables.
Upvotes: 5