user23546453
user23546453

Reputation:

Hash-flooding attacks for integer hashmaps in python

Why is Hash-flooding attacks for integer hashmaps not fixed in Python yet? For example, I can make a large list of numbers such that every number collides with the previous number and this causes hashmap collision bomb hence increasing time complexity to o(n**2). This is not the case for strings since python uses "SipHash" algorithm, which is designed to be resistant to hash-flooding attacks.

Conside this code

import time
m = 2**61 - 1  
n = 10000  
from collections import Counter

collide = [m * i for i in range(1, n + 1)]
s = time.time()
c = Counter(collide)
e = time.time()
print(e-s)

another = [k for k in range(n)]
s = time.time()
c = Counter(another)
e = time.time()
print(e-s)

The collide array takes 1.4420311450958252 seconds while the normal array takes only 0.0005929470062255859 seconds. Is there any way to mitigate this problem since python doesnt allow for custom hash implementation?

Upvotes: 1

Views: 69

Answers (1)

Bill Huneke
Bill Huneke

Reputation: 1071

Well, if you are sure you are more likely to be dealing with ints in the pathological category, then you could consider pre-treating your keys before putting them in the dict.

  • convert to string - in my tests, this simple workaround speeds you right up
import time
m = 2**61 - 1
n = 10000
from collections import Counter

collide = [str(m * i) for i in range(1, n + 1)]
s = time.time()
c = Counter(collide)
e = time.time()
print(e-s)

another = [k for k in range(n)]
s = time.time()
c = Counter(another)
e = time.time()
print(e-s)

Gives:

0.0013701915740966797
0.0009481906890869141
  • perform some other simple transform on the int and then remember to transform it back again before you use it. For example, since Python's int hash wants variation in the low bits, you could do a bit rotate before putting the int in the dict and then rotate back when you want to use it.

This is effectively the same as you doing your own hash function. I guess you could apply whatever logic you want in here to transform the int into something more hashable.

e.g.

from collections import Counter
import time

m = 2**61 - 1
n = 10000
# This will determine the maximum int you can safely rotate and the speed of your rotation operation
bit_format_string = "{:080b}"
bit_shift = 62


# TODO: Figure out a better way to do rotations
def rotate_left(x, n):
    return int(bit_format_string.format(x)[n:] + bit_format_string.format(x)[:n], 2)


def rotate_right(x, n):
    return int(bit_format_string.format(x)[-n:] + bit_format_string.format(x)[:-n], 2)


collide = [m * i for i in range(1, n + 1)]
shifted = [rotate_right(i, bit_shift) for i in collide]
s = time.time()
c = Counter(shifted)
e = time.time()
print(e-s)

# This restores the original values and gets you your counts
counts = [(rotate_left(k, bit_shift), v) for (k, v) in c.items()]

another = [k for k in range(n)]
s = time.time()
c = Counter(another)
e = time.time()
print(e-s)

This one is fast too, and it gets the correct output. Result:

0.0010600090026855469
0.0006046295166015625

and this passes: assert set(t[0] for t in counts) == set(collide)

Upvotes: 0

Related Questions