Andriy Mytroshyn
Andriy Mytroshyn

Reputation: 221

The best way for storing hash in database

I need to find out the best way from performance and storage point of view for storing the hash, for example, MD5. The current database is MariaDB in the future it could be Oracle. The table will contain hundreds of millions or billions record and each record should include one hash value, that could be used for the search. At this time for storing the hash I use varchar(32), but I think this type is not optimal. I also think about type like binary, char, also as I understand for stable sizes, like in hash, is better to used char instead of varchar and binary instead of varbinary. Also, I think about converting the hash to digits, will it be better? So what is the best way of storing the hash in Database?

Upvotes: 4

Views: 11914

Answers (3)

Rick James
Rick James

Reputation: 142198

MySQL/MariaDB: BINARY(16). It occupies 16 bytes, and is sufficient for MD5. Clearly you need an INDEX on the column.

But let me point out a design flaw in using hashes...

If you have a billion rows, but cannot cache more than a fraction of them, then any lookup is very likely to require a disk hit. This is because of the randomness of MD5 (or UUID or ...). The in-RAM cache (InnoDB's buffer pool, in the case of MySQL/MariaDB) is unlikely to have the block containing the next value you need.

Do the math. How fast can a disk block (that is not cached) be read? A little bit of arithmetic on that gives you how few reads/second you can perform. A spinning drive: 10ms --> 100 reads/sec. Multi-threading will not help. RAID striping will help, some.

Similarly, INSERTing is limited to about the same amount. Early in inserting a billion rows, things will be fast due to caching; later it will slow down to 100 rows/sec. Inserting a billion rows will take months.

PARTITIONing will not improve performance.

You may need code (either in SQL or your app) to convert between whatever the function delivers and BINARY (which is similar to BLOB).

Upvotes: 1

Marmite Bomber
Marmite Bomber

Reputation: 21043

The short answer is each datatype should be stored in the native format supported by the RDBMS.

This is for Oracle RAW(16) for the MD5 hash code.

You see this analogy - some have decided to store DATE columns using VARCHAR format. You get a database independence but you can't use any function provided for DATEcolumns by your RDBMS.

Anyway you should carefully consider why use a HASH column in a database.

If it is a substitute to quickly recognise a change of row columns, it could save you a lot of coding and processing.

To use a hash code as a key, try to find at least one reason, that you will profit from a HASH key, compared with a use of a sequence generated key.

Be carefull not to use HASH as a key only because this is recomended from some source before you see in your own imlementation the described positive effect.

Upvotes: 3

MT0
MT0

Reputation: 167774

In Oracle, use the RAW data type for binary data up to 4000 bytes and BLOB for larger values.

If your hash function generates a number then you can use the UTL_RAW.CAST_FROM_NUMBER function to convert it to a RAW data type.

Upvotes: 0

Related Questions