Reputation: 25852
I want to simply convert a string of any length to an integer value. Each string will map to a unique or even non-unique integer. Is there any existing opensource command that does this?
Bonus points if it is unique, such as computing the lexicographical order via a bash command.
Upvotes: 13
Views: 18774
Reputation: 241671
You need to be careful about using hash
functions from common programming languages. It has been common to introduce randomized seeds into hash functions, so that hash values are only unique for a single program execution. This avoids a denial-of-service attack noted in oCert advisory 2011-3. (As that advisory notes, the problem was described in 2003 in a paper presented to Usenix.)
For example, the Python hash function has been randomized by default since v3.3:
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-2595772619214671013
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-6001956461950650533
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-7414807274805087300
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-327608370992723225
# Python2 generates consistent hash values
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
You can control hash randomization in Python by setting the PYTHONHASHSEED
environment variable.
Or you can use a standardized cryptographic hash like SHA-1. The commonly-available sha1sum
utility outputs its result in hexadecimal, but you can convert that to decimal with bash (truncated to 64 bits):
$ echo $((0x$(sha1sum <<<"string to hash")0))
-7037254581539467098
or in its full 160-bit glory, using bc
(which requires hex to be written in upper-case):
$ bc <<<ibase=16\;$(sha1sum <<<"string to hash"|tr a-z A-Z)0
861191872165666513280590001082621748432296579238
If you only need the hash value modulo some power of 16, you can use the first few bytes of the SHA-1 sum. (You could use any selection of bytes -- they're all equally well distributed -- but the first few are easier to extract):
$ echo $((0x$(sha1sum <<<"string to hash"|cut -c1-2)))
150
Note: As @gniourf_gniourf points out in a comment, the above doesn't really compute the SHA-1 checksum of the given string because the bash here-string syntax (<<<word
) appends a newline to word
. Since the checksum of the string with a newline appended is just as good a hash as the checksum of the string itself, there is no problem as long as you always use the same mechanism to produce the hash.
Upvotes: 18
Reputation: 39354
You could use the sum
or cksum
command (the latter being preferred) to generate a base-10 integer:
$ cksum <<< 'hello world' | cut -f 1 -d ' '
3733384285
$ cksum <<< 'goodbye world' | cut -f 1 -d ' '
2600070097
If you're interested in the math behind these simple hashes, check out the source implementations:
-r
and -s
command-line arguments.Upvotes: 20