Reputation: 21
Is there any way to hash a string and specify the characters allowed in the output, or a better approach to avoid collisions when producing a hash of 8 characters in length.
I am running into a situation where I am seeing a collision with my current hashing method (see example implementation below). currently using crc32 from https://guava.dev/releases/20.0/api/docs/com/google/common/hash/Hashing.html
the hashes produced are alphaNumeric, 8 characters in length. I need to keep the 8 digit length (not storing passwords), Is there a way to specify an "Alphabet" of allowed output characters of a hashing function?
e.g. to allow (a-z, 0-9,) and a set of characters e.g. (_,$,-), the characters added will need to be URI friendly
This would allow me to decrease the possibility of collisions occurring.
The hash output will be stored in a cache for a maximum of 60 days, so collisions occurring after that period will have no affect
current approach example code:
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hasher;
import com.google.common.hash.Hashing;
public class Test {
private static final String SALT = "4767c3a6-73bc-11ec-90d6-0242ac120003";
public static void main( String[] args )
{
// actual strings causing collisions removed as have to redact some data
String string1 = "myStringOne";
String string2 = "myStringTwo";
System.out.println( "string1:" + string1);
System.out.println( "string1 hashed:" + doHash(string1, SALT));
System.out.println( "string2:" + string2);
System.out.println( "string2 hash:" + doHash(string2, SALT));
}
private static String doHash(String keyValue, String salt){
HashFunction func = Hashing.crc32();
Hasher hasher = func.newHasher();
hasher.putUnencodedChars(keyValue);
hasher.putUnencodedChars(salt);
return hasher.hash().toString();
}
}
functionality of the code/problem statement using key store db. A user requests a resource, hash is made of (user details & requested resource). if resulting id already present -> return that item from DB
else, perform processing on resource and store in db, with result from hash as ID
cache is purged periodically.
Questions. Is there a way to specify the alphabet the hash is allowed to use in its output? I checked the docs but do not see an approach https://guava.dev/releases/20.0/api/docs/com/google/common/hash/Hashing.html
Or is there an alternative approach that would be recommended? e.g. generating a longer hash and taking a subset.
Upvotes: 2
Views: 1191