Reputation: 21461
I'm trying to do a huge amount of simple "intersection" operations with integers. Unfortunately, I do not have numpy/scipy available in the setup, and I'm not able to change that.
I noticed on stackoverflow that the Python set operation nicely sorts the data, which not only speeds up loads of cases, but in my case, I'd actually like to sort the data as well, thus it would be an awesome bonus.
I'm now just afraid it does not always work, so I went to test:
import random
one = range(100)
two = range(50)
three = range(50)
for i in xrange(1000000):
# shuffle the lists
random.shuffle(one)
random.shuffle(two)
# do set operation
res = [v for v in set(one) & set(two)]
if res != three:
print res
The result is that all the samples are sorted (no wrong cases are printed).
While this is quite convincing, I'd like to know if there would be a case where the integers are not completely sorted when using a set intersection?
Upvotes: 0
Views: 318
Reputation: 304403
Counterexamples are very easy to find if you know where to look
>>> [v for v in set(range(-10,0)) & set(range(-5,10))]
[-2, -5, -4, -3, -1]
Upvotes: 1
Reputation: 41523
No, it's not.
CPython's set intersection implementation works by parallel iteration over the two sets, in hash order. Matching hashes are further tested for equality.
If you have a set of small contiguous int
s, they'll all hash to themselves, so everything will work out fine. But if the sets are anything else (widely spaced ints, strings, whatever), this same effect won't appear.
Upvotes: 3
Reputation: 122496
A set doesn't have order so any ordering is accidental. Or, to be precise, it does have some ordering but you can't make any assumptions regarding it. If you want the result to be sorted you'll need to sort it yourself using sorted()
.
Upvotes: 2