Sebastien Lorber
Sebastien Lorber

Reputation: 92210

Library to reproduce the Java primitives hashCode logic in C / C++ and other languages

I would like to know if there is a multi-language library or something that permits to give me the following result:

What i'd like to know is: how can i easily get the hashcode 78911 in my C program? Since each language can provide its own hash algorithm for a String, how can i handle that?


I'm asking this in the context of using Distributed Hash Tables (datagrids, distributed caches, NoSQL...). I'm planning to create something similar to a very simple client in C for a Java proprietary datagrid.

This is my usecase for now, but for my project, i will need a hash algorithm compatible with multiple languages: - Java hash algorithm in Ruby - C# hash algorithm in Java - C++ hash algorithm in Java - Java hash algorithm in C++ - Java hash algorithm in Erlang In any case, the hash of both algorithms in both languages will need to produce the exact same hash value.

And if possible, i'd like to extend the concept to primitive types and "simple structures" and not just for String


Does anyone know any tool to handle my usecase?


Edit: for Jim Balter

My usecase is:

I have a proprietary partitioning/datagrid technology called GemFire, written in Java. It acts as a distributed hashmap. The number of buckets in the hashmap is fixed. For each map key, it computes its hashcode, and apply a modulo, so that it knows for each key to each bucket it belongs to.

For exemple, if i have 113 bucket (which is the default number of buckets in gemfire), and my map key is the String "Key"

"Key".hashCode() % 113 = 69

Thus GemFire knows "Key" belongs to the 69nth bucket.

Now i have a C application:

So if you know how to do that without having to write/use a Java hashcode port in C, please tell me.

Edit: to avoid confusion: i'm not looking for a anything else, but Jim Balter you suggested i do not need what i claim to need so tell me if you see any other solution, except using like you said a custom or popular hash algorithm.

And in the future i may need to do the same for an Erlang partitionning application with a C# client application, and other languages!


Edit: I would like to avoid using a non-java hash algo (as someone suggested using md5/sha1 or any faster non-security-oriented hash algo). This is because my solution aims to be deployed on legacy distributed systems oftenly written in Java, which already contain a lot of data, and any change in the hash algorithm would require a heavy migration process of the data. However i keep this solution in mind since it could be a sweet second option for people starting a new distributed system from scratch or ready to do their data migration.


So in the end, what i am looking for is not some people to tell me to implement the Java String hash algorithm in C, i already know i can do that thanks! I want to know if someone already did it, and not only for implementing all primitive java algorithms in C, but also in other languages, and from other languages!!! I'm looking for a multi-languages library that provides for each other language, a port of the hash algorithms.

Thus if there would be only 3 languages in earth (C, Java and Python), my question is: is there any polyglot library that provides:

For all primitive types available, and eventually basic structures. If for a given language there is no "default hash algorithm" then the most widely used can be considered as the language algorithm.

You see what i mean? I want to know if there is a LIBRARY! i know i can look in the JDK or specification and implement it on my own, but as i'm targeting a large number of languages and i don't know how to code in every languages, i'd like someone to have did it for me and made available in an opensource, free to use project!

Upvotes: 2

Views: 1422

Answers (2)

Jim Balter
Jim Balter

Reputation: 16424

The algorithm for calculating the hash code of a Java string is quite simple and is documented as part of the public specification: http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#hashCode()

The hash code for a String object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

Note also that String is a final class so its methods cannot be overridden; thus, you are guaranteed that the given algorithm is correct for any Java String.

For languages other than Java, if the language does not specify the hash algorithm (and Java is unusual in doing so), then you cannot be sure that the hash algorithm won't change, even if you can ascertain it. I suspect that you do not actually need what you claim you need, but you would have to say more about your requirements (as opposed to what you think would address them).

Upvotes: 0

Yair Zaslavsky
Yair Zaslavsky

Reputation: 4137

I would add that you can browse via the source code of OpenJDK and see the hashCode implementation. However, bare in mind that as suggested at the comment suggested by Jim Garrison, different classes may override hashCode, so you will have to follow the implementation. I would suggest for performing hashing of Strings to use well known hash functions, such as sha-1 , or maybe md5 - you can find implementations both at Java , C/C++ and other programming languages.

Upvotes: 1

Related Questions