konung
konung

Reputation: 7038

Adding unique index to a MySQL text field string using hashing in Ruby

I have a Rails app with a model (table) that has 1,500,000 records, with text field size ranging from 50 to 8000 characters.

I need to ensure that a text field is unique. Since I can't use MySQL unique index on text field, my solution is to convert my text field into a hash ( using Digest::SHA256.hexdigest) and save this hash into a varchar field called "body_hash". Then add a unique index, on that field.

Questions:

Upvotes: 0

Views: 587

Answers (2)

odlp
odlp

Reputation: 5124

You may want to consider generating this checksum / digest in the DB itself. This would be faster to back fill values for existing data rather than process in Ruby.

Combining MySQL's CREATE_DIGEST to populate a body_digest column:

CREATE_DIGEST('SHA512', 'The quick brown fox');

https://dev.mysql.com/doc/refman/8.0/en/enterprise-encryption-functions.html#function_create-digest

And an BEFORE INSERT / BEFORE UPDATE trigger to set this checksum value:

https://dev.mysql.com/doc/refman/8.0/en/trigger-syntax.html

Worth mentioning there's a Ruby library for declaring database triggers on ActiveRecord models: https://github.com/jenseng/hair_trigger

Upvotes: 1

tadman
tadman

Reputation: 211560

The Rails solution is a before_save hook. You could also do this with a database trigger but that's a lot more messy and brittle.

SHA256 is probably fine here, as it's SHA2-based. The chance of a collision should be vanishingly small.

Using a consistent length hash instead of the text for the unique index is actually a great idea because MySQL's indexes get larger for longer strings. A short hex-string is a lot easier for the indexing engine to handle and still provides the uniqueness constraint you want.

MySQL's unique constraints actually provide an ordering mechanism as well, that's the source of pain here, but if you aren't concerned with ordering the hash solution is a great alternative.

Upvotes: 1

Related Questions