Reputation: 364
Let's assume we have this table
ID | MAIL
1 | [email protected]
2 | [email protected]
3 | [email protected]
you get the point... I want to query this table now and I receive as input the MD5 hash of the MAIL column values. So my queries would look like this:
SELECT * FROM TABLE
WHERE HashBytes('MD5', MAIL ) = 'CE42C51D0B63DF9F616CCFB4C3FCF16C'
Now imagine this table is a very big table, thousand of thousand of records. My query becomes really slow. What I want to do now is to add an index, but using MD5. I read it can be done on PostgreSQL using something like this:
CREATE INDEX mail_md5 ON TABLE(DECODE(MD5(MAIL), 'HEX'))
Can I do something similar in SQL Server?
Note: I can't add another column with the hash values :(
Note 2: I know a HASH is an irreversible function, the example for the PostgreSQL index was copied from here
Upvotes: 1
Views: 2145
Reputation: 5133
I'm not sure what you are allowed to do schema-wise, but if you can modify the schema then SQL Server has something called Indexed Views, which are views that are stored in memory (vs computed on the fly).
You can query the view instead of the underlying table, and SQL Server will keep it all up-to-date for you. The key phrase is WITH SCHEMABINDING
, which tells SQL Server to keep the computed fields in memory.
For example:
CREATE VIEW HashedAddresses
WITH SCHEMABINDING
AS
SELECT ID, MAIL, HASHBYTES('MD5',MAIL) as HashedMailMD5 from myschema.mytable;
Then you can create a unique clustered index on your hash field:
CREATE UNIQUE CLUSTERED INDEX IndexHashedAddresses ON HashedAddresses(HashedMailMD5);
after which this should be fast:
SELECT ID FROM HashedAddresses WHERE HashedMailMD5 = '0x121....'
Issue: if you get an MD5 collision, the index will fail. Not sure what to do about that...
Upvotes: 1