Reputation: 7038
I have a Rails app with a model (table) that has 1,500,000 records, with text field size ranging from 50 to 8000 characters.
I need to ensure that a text field is unique. Since I can't use MySQL unique index on text field, my solution is to convert my text field into a hash ( using Digest::SHA256.hexdigest) and save this hash into a varchar field called "body_hash". Then add a unique index, on that field.
Questions:
Upvotes: 0
Views: 587
Reputation: 5124
You may want to consider generating this checksum / digest in the DB itself. This would be faster to back fill values for existing data rather than process in Ruby.
Combining MySQL's CREATE_DIGEST
to populate a body_digest
column:
CREATE_DIGEST('SHA512', 'The quick brown fox');
https://dev.mysql.com/doc/refman/8.0/en/enterprise-encryption-functions.html#function_create-digest
And an BEFORE INSERT
/ BEFORE UPDATE
trigger to set this checksum value:
https://dev.mysql.com/doc/refman/8.0/en/trigger-syntax.html
Worth mentioning there's a Ruby library for declaring database triggers on ActiveRecord models: https://github.com/jenseng/hair_trigger
Upvotes: 1
Reputation: 211560
The Rails solution is a before_save
hook. You could also do this with a database trigger but that's a lot more messy and brittle.
SHA256 is probably fine here, as it's SHA2-based. The chance of a collision should be vanishingly small.
Using a consistent length hash instead of the text for the unique index is actually a great idea because MySQL's indexes get larger for longer strings. A short hex-string is a lot easier for the indexing engine to handle and still provides the uniqueness constraint you want.
MySQL's unique constraints actually provide an ordering mechanism as well, that's the source of pain here, but if you aren't concerned with ordering the hash solution is a great alternative.
Upvotes: 1