Reputation: 161
I want make value of TEXT field unique in my MySQL table.
After small research I discovered that everybody are discouraging using UNIQUE INDEX on TEXT fields, due to performance issues. What I want to use now is:
1) create another field to contain hash of TEXT value (md5(text_value))
2) make this hash field UNIQUE index
3) use INSERT IGNORE in queries
Is this solution complete, secure and optimal? (found it on SO)
Is there a better way of achiving this goal?
Upvotes: 4
Views: 8570
Reputation: 5666
As I was asked in the comments how I would solve this, I'll write it as a response.
Being in such a situation suggests mistakes in the application design. Consider what that means.
You have a text of which you cannot specify the length in advance, and which can be extremely long (up to 64k), of which you want to keep uniqueness. Imagine such an amount of data split into separate keys, and composing a composite index to generate uniqueness. This is what you're trying to do. For integers, this would be an index of 16000 integers, joined in a composite index.
Consider further that CHARACTER type fields (CHAR, VARCHAR, TEXT) underly interpretation by encoding, which further complicates the issue.
I'd highly recommend splitting the data up somehow. This not only frees the DBMS from incorporating variable length character blocks, but also might give some possibility of generating composite keys over parts of the data. Maybe you could even find a better storage solution for your data.
If you have questions, I'd suggest posting the table and/or database structure and explaining what logical data the TEXT field contains, and why you think it would need to be unique.
Upvotes: 3
Reputation: 3703
It’s almost complete. There is a chance (Birthday Paradox) that there will be a collision with a hash so a UNIQUE index alone isn’t enough.
You’re better using a hash along with a comparison check to be completely safe.
SELECT COUNT(*) FROM table
WHERE md5hash = MD5(text)
AND textvalue = text
This could be wrapped into an INSERT or UPDATE TRIGGER – or maybe even a STORED PROCEDUR for easy checking.
Have a look at this Stack Overflow question
for an example of a hash collision.
Upvotes: 2