Reputation: 2840
Is it faster to loop through a Python set
of numbers or a Python set
of letters given that each set is the exact same length and each item within each set is the same length? Why?
I would think that there would be a difference because letters have more possible characters [a-zA-Z] than numbers [0-9] and therefor would be more 'random' and likely affect the hashing to some extent.
numbers = set([00000,00001,00002,00003,00004,00005, ... 99999])
letters = set(['aaaaa','aaaab','aaaac','aaaad', ... 'aaabZZ']) # this is just an example, it does not actually end here
for item in numbers:
do_something()
for item in letters:
do_something()
where len(numbers) == len(letters)
Update: I am interested in Python's specific hashing algorithm and what happens behind the scenes with this implementation.
Upvotes: 1
Views: 367
Reputation: 279225
There might be some particular implementation details of Python that I'm not aware of that mess with my general arguments here, but:
timeit
, with a realistic example of the data you care about. Speculation is usually a waste of time.You can see the results of Python's hashing algorithms like this:
>>> foo = 3
>>> foo.__hash__()
3
>>> foo = 1856348
>>> foo.__hash__()
1856348
>>> foo = "\x00"
>>> foo.__hash__()
1
>>> foo = "\x01"
>>> foo.__hash__()
128000384
>>> foo = "\x02"
>>> foo.__hash__()
256000771
So on my copy of Python, those hash results match these reported Python hash algorithms. As ever with CPython, you can look at the source to confirm the algorithm.
Upvotes: 7
Reputation: 18633
You can't know until you profile! Here's some crude data:
% python -mtimeit -s 'import string;import random;s=set("".join(random.sample(string.printable, 5))for _ in range(10000))' 'for foo in s: bar=foo'
1000 loops, best of 3: 376 usec per loop
% python -mtimeit -s 'import random;s=set(int("".join(map(str,random.sample(range(10),5))))for _ in range(10000))' 'for foo in s: bar=foo'
1000 loops, best of 3: 322 usec per loop
It looks like you're correct that there is a slight difference, but you'd need to do a lot more testing to say much with confidence.
Upvotes: 2