Reputation: 378
For the sake of the argument, consider following (very bad) sorting algorithm in python:
def so(ar):
while True:
le = len(ar)
switch = False
for y in range(le):
if y+1 == le:
break
if ar[y] > ar[y+1]:
ar[y],ar[y+1] = ar[y+1],ar[y]
switch = True
if switch == False:
break
return ar
I'm trying to understand the concept of "complexity of the algorithm" and there is one thing I don't get.
I came across the post that explains how to find the complexity of the algorithm here:
You add up how many machine instructions it will execute as a function of the size of its input, and then simplify the expression to the largest (when N is very large) term and can include any simplifying constant factor.
But well, the problem is, that I cannot calculate how many machine instructions will be executed just by knowing the length of the list.
Consider first example:
li = [random.randint(1,5000) for x in range(3000)]
start = time.time()
so(li)
end = time.time() - start
print(end)
Output: 2.96921706199646
Now have a look at the second example:
ok = [5000,43000,232] + [x for x in range(2997)]
start = time.time()
so(ok)
end = time.time() - start
print(end)
Output: 0.010689020156860352
We can see that the same sorting algorithm, two different lists, lists are the same length, and two completely different execution times.
When people are talking about algorithm complexity (big O notation) they normally assume that the only variable that determines complexity of the algo is the size of the input, but clearly, in the example above it is not the case. It is not only the size of the list, but also the positioning of each value within the list that determines the speed of the sorting.
So my question is, why do we only consider size of input when estimating complexity? And, if it is possible, can you tell me what the complexity of the algorithm above will be?
Upvotes: 3
Views: 548
Reputation: 43503
why do we only consider size of input when estimating complexity?
In the narrow sense of complexity as of the use of Big O notation in computer science, it is simply by definition:
In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows.
In the broader sense your question could be interpreted as "why do we use Big O notation to describe algorithm complexity when the nature of the data can be just as important as its size."
The answer here lies in the fact that algorithm development is often done on small datasets to make it easy, while in the real world the datasets are huge. When you are writing your sorting function you're most likely going to try it first on small lists of random data. You'd want the result small enough that you can verify that it worked by simply looking at the result...
Upvotes: 2
Reputation: 365
The time complexity is not always definitely dependent on size of input. When we look at randomized sorting algorithms, the input patterns might play a significant role in determining time complexity.
We usually calculate time complexity in terms of worst, good and average case and could particularly study time complexity in terms of specific input order/patterns which could lead to good, average and best case time complexity.
For example, in first case provided by you, since input is randomized, there is 1/n!
probability for a particular input to occur. The good case (when the list is sorted already) is Ω(n)
and the worst case(when the list is reversely sorted) is O(n²)
, but the probability is low for best or worst case to occur.
Therefore, the sorting algorithm has θ(n²)
average time complexity since the probability of comparison and swap in case of two elements in average case input is high due to random distribution of numbers.
In the second case, the order is strict which means high probability for input to tend toward best case or worst case time complexity . In your case, input is more tending towards good case, therefore lesser time.
Upvotes: 0
Reputation: 54253
You're correct, complexity doesn't only depend on N
. That's why you'll often see indications about average, worst and best cases.
Timsort is used in Python because it's (O n log n)
on average, still fast for worst-cases (O(n log n)
) and extremely fast for best-cases (O(n)
, when the list is already sorted).
Quicksort also has an average complexity of O(n log n)
, but its worst case is O(n²)
, when the list is already sorted. This use case happens very often, so it might be worth it to actually shuffle the list before sorting it!
Upvotes: 3