AlanK
AlanK

Reputation: 9843

Big-O of list comprehension that calls max in the condition: O(n) or O(n^2)?

Q. Write an algorithm that returns the second largest number in an array

a = [1, 2, 3, 4, 5]
print(max([x for x in a if x != max(a)]))
>> 4

I'm trying to figure out how this algorithm works and whether or not pythons internal magic will make this as efficient as writing a linear algorithm which just loops over the list a once and stores the highest and second highest values.

Correct me if I'm wrong:

Would python be smart enough to cache the value of max(a) or would this mean that the list comprehension part the algorithm is O(n^2)?

And then the final max([listcomp]) would be another O(n), but this would only run once after the comprehension is finished, so the final algorithm would be O(n^2)?

Is there any fancy business going on internally that would cache the max(a) value and result in this algorithm working out quicker than O(n^2)?

Upvotes: 4

Views: 2612

Answers (2)

cs95
cs95

Reputation: 402852

Would python be smart enough to cache the value of max(a) or would this mean that the list comprehension part the algorithm is O(n^2)?

No, because, as MSeifert said in a comment, python doesn't make assumptions about a, and so doesn't cache the value of max(a), which is recomputed each time.

You might want to consider an implementation that keeps track of the largest two items in one pass. You'll need to code an explicit for loop and do it. Here's a useful link from GeeksForGeeks (this, I recommend).

Alternatively, you can consider multiple traversals that still ends up being linear in complexity.

In [1782]: a = [1, 2, 3, 4, 5]

In [1783]: max(set(a) - {max(a)}) # 3 linear traversals
Out[1783]: 4

There's scope for improvement here, but like I said, nothing beats the explicit for loop approach.

Upvotes: 3

MSeifert
MSeifert

Reputation: 152735

The easy way to find out is timing it. Consider this timing code:

for i in range(1, 5):
    a = list(range(10**i))
    %timeit max([x for x in a if x != max(a)])
17.6 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
698 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
61.6 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
6.31 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Each time it multiplied the number of elements by 10 the runtime increased by 100. That's almost certainly O(n**2). For an O(n) algorithm the runtime would increase linearly with the number of elements:

for i in range(1, 6):
    a = list(range(10**i))
    max_ = max(a)
    %timeit max([x for x in a if x != max_])
4.82 µs ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
262 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.42 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
24.9 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

But I'm not certain the algorithm really does what is asked. Consider the list a=[1,3,3] even the heapq module tells me that the second largest element is 3 (not 1 - what your algorithm returns):

import heapq

>>> heapq.nlargest(2, [1,3,3])[0]
3

Upvotes: 6

Related Questions