tpngr999
tpngr999

Reputation: 21

Java, how to hash a string with low collision probability, specify characters allowed in output to decrease this

Is there any way to hash a string and specify the characters allowed in the output, or a better approach to avoid collisions when producing a hash of 8 characters in length.

I am running into a situation where I am seeing a collision with my current hashing method (see example implementation below). currently using crc32 from https://guava.dev/releases/20.0/api/docs/com/google/common/hash/Hashing.html

the hashes produced are alphaNumeric, 8 characters in length. I need to keep the 8 digit length (not storing passwords), Is there a way to specify an "Alphabet" of allowed output characters of a hashing function?

e.g. to allow (a-z, 0-9,) and a set of characters e.g. (_,$,-), the characters added will need to be URI friendly

This would allow me to decrease the possibility of collisions occurring.

The hash output will be stored in a cache for a maximum of 60 days, so collisions occurring after that period will have no affect

current approach example code:

import com.google.common.hash.HashFunction;
import com.google.common.hash.Hasher;
import com.google.common.hash.Hashing;

public class Test {
        private static final String SALT = "4767c3a6-73bc-11ec-90d6-0242ac120003";

        public static void main( String[] args )
        {
            // actual strings causing collisions removed as have to redact some data
            String string1 = "myStringOne";
            String string2 = "myStringTwo";

            System.out.println( "string1:" + string1);
            System.out.println( "string1 hashed:" + doHash(string1, SALT));
            System.out.println( "string2:" + string2);
            System.out.println( "string2 hash:" + doHash(string2, SALT));
        }

        private static String doHash(String keyValue, String salt){
            HashFunction func = Hashing.crc32();
            Hasher hasher = func.newHasher();
            hasher.putUnencodedChars(keyValue);
            hasher.putUnencodedChars(salt);
            return hasher.hash().toString();
        }
}

functionality of the code/problem statement using key store db. A user requests a resource, hash is made of (user details & requested resource). if resulting id already present -> return that item from DB

else, perform processing on resource and store in db, with result from hash as ID

cache is purged periodically.

Questions. Is there a way to specify the alphabet the hash is allowed to use in its output? I checked the docs but do not see an approach https://guava.dev/releases/20.0/api/docs/com/google/common/hash/Hashing.html

Or is there an alternative approach that would be recommended? e.g. generating a longer hash and taking a subset.

Upvotes: 2

Views: 1191

Answers (0)

Related Questions