Reputation: 3856
I need to find a random color, given a specific seed number - fast. given the same ID twice, should return the same color.
I did this:
def id_to_random_color(number):
random_bytes = hashlib.sha1(bytes(number)).digest()
return [int(random_bytes[-1]) / 255, int(random_bytes[-2]) / 255, int(random_bytes[-3]) / 255, 1.0]
The problem is that calculating the sha1 of numbers many times is very slow in total. (I'm using this function about 100k times)
Edit: The reason I use a hash function is that I want to have colors that are different for numbers that are close
e.g id_to_random_color(7)
should be very different from id_to_random_color(9)
Upvotes: 6
Views: 2157
Reputation: 55469
You didn't mention the range of number
. It must be a non-negative integer, otherwise bytes(number)
would fail. (BTW, that function returns a bytes
string consisting of number
zero bytes, which will chew up a lot of RAM if number
is large). I presume number
is at least 24 bits to cover the 24 bit RGB color space.
Using a cryptographic hash function for this purpose is overkill. OTOH, the hashlib
functions are quite fast, since they're coded in C. We could utilize the built_in hash
function, however hash(n)
simply returns n
for machine-size integers, so we'd need to do something like hash((n, n))
to get random-looking output. However, the results of doing that sort of thing aren't particularly random: hash
is designed for hashtable work, not for the kind of scrambling we want here.
To generate the random RGB values I've adapted the mixing algorithm from xxHash by Yann Collet. You can view the C source code of that algorithm in the xxhash.c source code. This algorithm is reasonably fast, and it has good avalanching. Bret Mulvey has written a good introductory article about hash mixing functions and the avalanche effect.
def id_to_random_color(n):
n = ((n ^ n >> 15) * 2246822519) & 0xffffffff
n = ((n ^ n >> 13) * 3266489917) & 0xffffffff
n = (n ^ n >> 16) >> 8
return [u / 255. for u in n.to_bytes(3, 'big')] + [1.0]
This function works well for n
in range(2**24)
, and in fact its results are quite good over the whole of range(2**32)
; it will still give usable results outside that range. To test it here, I'll use a simplified version that returns the RGB values as integers. The first test simply shows the RGB values for n
in range(20)
. The second test generates 25600 random numbers and finds the corresponding RGB values. We should get approximately 100 hits for each R, G, & B value.
from collections import Counter
from random import seed, randrange
seed(42)
def id_to_RGB(n):
n = ((n ^ n >> 15) * 2246822519) & 0xffffffff
n = ((n ^ n >> 13) * 3266489917) & 0xffffffff
n = (n ^ n >> 16) >> 8
return tuple(n.to_bytes(3, 'big'))
# Tests
# Show some colors
for i in range(20):
rgb = id_to_RGB(i)
print('{:2d}: {:02x} {:02x} {:02x}'.format(i, *rgb))
print()
# Count the frequency of each color for random `n`
counts = {k: Counter() for k in 'rgb'}
for i in range(25600):
n = randrange(2 ** 32)
for k, v in zip('rgb', id_to_RGB(n)):
counts[k][v] += 1
for k in 'rgb':
print(k, sorted(counts[k].values()))
output
0: 00 00 00
1: 60 6d 18
2: 4e f2 bf
3: 75 4f 48
4: 60 98 f1
5: 17 1d 98
6: 3b 69 13
7: aa 10 98
8: c1 31 e3
9: 1e fa 4a
10: 7f 05 b2
11: 86 0e b3
12: 39 84 c6
13: c1 75 4f
14: e2 38 87
15: db 54 79
16: 45 14 b6
17: cb 56 68
18: 8e bf d8
19: cd 50 3f
Counter output
r [74, 75, 75, 77, 78, 80, 80, 80, 80, 81, 82, 83, 84, 85, 85, 85, 86, 86, 86, 88, 88, 88, 88, 89, 89, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 92, 92, 92, 93, 93, 93, 93, 93, 93, 93, 93, 93, 94, 94, 94, 94, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 95, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 97, 97, 97, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 103, 103, 103, 103, 104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105, 105, 105, 105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 108, 108, 108, 108, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 110, 110, 110, 110, 110, 110, 110, 110, 111, 112, 112, 112, 112, 112, 113, 113, 113, 114, 114, 115, 115, 115, 115, 116, 116, 116, 116, 118, 119, 120, 123, 124, 126, 128, 138]
g [73, 74, 74, 77, 78, 79, 79, 81, 81, 82, 82, 83, 83, 83, 84, 84, 84, 84, 85, 85, 85, 86, 87, 87, 87, 87, 87, 87, 87, 88, 88, 88, 88, 88, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 92, 92, 92, 92, 92, 92, 92, 92, 92, 93, 93, 93, 93, 93, 93, 93, 93, 94, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105, 105, 105, 105, 105, 105, 105, 105, 105, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 109, 109, 109, 110, 110, 110, 110, 110, 111, 111, 111, 111, 111, 111, 112, 112, 112, 112, 112, 112, 113, 113, 113, 113, 113, 113, 113, 113, 114, 114, 114, 114, 115, 115, 116, 117, 117, 117, 117, 118, 118, 118, 119, 119, 119, 120, 120, 121, 121, 121, 123, 125, 126, 128]
b [73, 74, 77, 78, 78, 79, 80, 80, 80, 81, 82, 84, 84, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 86, 86, 86, 86, 86, 87, 87, 87, 87, 88, 88, 89, 89, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 92, 93, 93, 93, 93, 93, 93, 93, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 97, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 103, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105, 105, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 109, 109, 109, 109, 109, 109, 109, 110, 110, 110, 111, 111, 111, 111, 112, 112, 112, 113, 113, 113, 114, 114, 114, 114, 114, 115, 115, 115, 115, 115, 115, 115, 115, 115, 116, 116, 116, 117, 118, 119, 120, 120, 122, 124, 126, 127, 128, 131]
You may notice that id_to_RGB
returns all zeroes for a zero input. If that's undesirable, you can add an extra mixing step at the start (also borrowed from xxHash).
def id_to_RGB(n):
n = (374761397 + n * 3266489917) & 0xffffffff
n = ((n ^ n >> 15) * 2246822519) & 0xffffffff
n = ((n ^ n >> 13) * 3266489917) & 0xffffffff
n = (n ^ n >> 16) >> 8
return tuple(n.to_bytes(3, 'big'))
Upvotes: 3
Reputation: 566
Using simple random number generators with kind of static variables could improve performance:
import random
prev, r, g, b = None, 0, 0, 0
def id_to_random_color(number):
global prev, r, g, b
if number != prev:
r = random.random()
g = random.random()
b = random.random()
prev = number
return r, g, b, 1.0
Update:
As AndrewMcDowell stated in his comment, the function could return different values if the input is repeated in not sequential occasions.
Here is a possible workaround:
import random
memory = {}
def id_to_random_color(number, memory):
if not number in memory:
r = random.random()
g = random.random()
b = random.random()
memory[number] = (r, g, b, 1.0)
return memory[number]
Further update:
The same function skeleton could be used even to calculate hash:
memory = {}
def id_to_random_color(number):
if not number in memory:
numByte = str.encode(number)
hashObj = hashlib.sha1(numByte).digest()
r, g, b = hashObj[-1] / 255.0, hashObj[-2] / 255.0, hashObj[-3] / 255.0
memory[number]= (r, g, b, 1.0)
return r, g, b, 1.0
else:
return memory[number]
Despite it is a bit more verbose syntax, the else
statement improves a bit performances, avoiding subsequent memory write and read (as Jake stated in his answer).
Upvotes: 3
Reputation: 637
I would use a dictionary to quickly index already generated seeds.
import random
random_seeds = {}
def id_to_random_color(number):
if number in random_seeds.keys():
return random_seeds[number]
else:
color = [random.random(), random.random(), random.random(), 1.0]
random_seeds[number] = color
return color
Upvotes: 0