Reputation:
Why is Hash-flooding attacks for integer hashmaps not fixed in Python yet? For example, I can make a large list of numbers such that every number collides with the previous number and this causes hashmap collision bomb hence increasing time complexity to o(n**2). This is not the case for strings since python uses "SipHash" algorithm, which is designed to be resistant to hash-flooding attacks.
Conside this code
import time
m = 2**61 - 1
n = 10000
from collections import Counter
collide = [m * i for i in range(1, n + 1)]
s = time.time()
c = Counter(collide)
e = time.time()
print(e-s)
another = [k for k in range(n)]
s = time.time()
c = Counter(another)
e = time.time()
print(e-s)
The collide array takes 1.4420311450958252 seconds while the normal array takes only 0.0005929470062255859 seconds. Is there any way to mitigate this problem since python doesnt allow for custom hash implementation?
Upvotes: 1
Views: 69
Reputation: 1071
Well, if you are sure you are more likely to be dealing with ints in the pathological category, then you could consider pre-treating your keys before putting them in the dict.
import time
m = 2**61 - 1
n = 10000
from collections import Counter
collide = [str(m * i) for i in range(1, n + 1)]
s = time.time()
c = Counter(collide)
e = time.time()
print(e-s)
another = [k for k in range(n)]
s = time.time()
c = Counter(another)
e = time.time()
print(e-s)
Gives:
0.0013701915740966797
0.0009481906890869141
This is effectively the same as you doing your own hash function. I guess you could apply whatever logic you want in here to transform the int into something more hashable.
e.g.
from collections import Counter
import time
m = 2**61 - 1
n = 10000
# This will determine the maximum int you can safely rotate and the speed of your rotation operation
bit_format_string = "{:080b}"
bit_shift = 62
# TODO: Figure out a better way to do rotations
def rotate_left(x, n):
return int(bit_format_string.format(x)[n:] + bit_format_string.format(x)[:n], 2)
def rotate_right(x, n):
return int(bit_format_string.format(x)[-n:] + bit_format_string.format(x)[:-n], 2)
collide = [m * i for i in range(1, n + 1)]
shifted = [rotate_right(i, bit_shift) for i in collide]
s = time.time()
c = Counter(shifted)
e = time.time()
print(e-s)
# This restores the original values and gets you your counts
counts = [(rotate_left(k, bit_shift), v) for (k, v) in c.items()]
another = [k for k in range(n)]
s = time.time()
c = Counter(another)
e = time.time()
print(e-s)
This one is fast too, and it gets the correct output. Result:
0.0010600090026855469
0.0006046295166015625
and this passes: assert set(t[0] for t in counts) == set(collide)
Upvotes: 0