Reputation: 2532
I have an array of numbers. I want to sort them and remove duplicates. This answer suggest to use set
and sort
for that kind of operation. The order of operations shouldn't change the results, so I measured the time of computations.
from numpy.random import randint
from time import clock
n = 1000000
A=randint(0,n,n)
t1=clock()
B=sorted(set(A))
t2=clock()
C=set(sorted(A))
t3=clock()
print t2-t1, t3-t2
>>> 0.48011 1.339263
sorted(set(A)) is roughly three times faster than set(sorted(A)).
What makes the one faster than the other? Also, is there any faster way to do it?
Upvotes: 1
Views: 1273
Reputation: 32189
This is because when you call:
set(sorted(A))
you are sorting the original full list and then filtering out the duplicate values. However, when you call:
sorted(set(A))
you are first shortening the list by removing duplicate values using set
and then sorting the much smaller list hence the shorter time.
Hope that makes sense.
>>> A = [1,2,3,1,2,3,1,2,3,1,2,3]
>>> A = sorted(A)
[1,1,1,1,2,2,2,2,3,3,3,3]
>>> set(A)
{1, 2, 3}
On the other hand:
>>> A = [3,2,1,3,2,1,3,2,1,3,2,1]
>>> A = set(A)
{3, 2, 1}
>>> sorted(A)
{1, 2, 3}
And as @Barmar said, sorting is much slower than removing duplicates therefore there is a real time gain when you have to sort a much smaller list (1/4 of the list in my example above)
a = [1,2,3] * 10000
set(sorted(a)) --> 0.1890001297
sorted(set(a)) --> 0.0079998970
Upvotes: 8