ning
ning

Reputation: 2111

Why does this O(n^2) solution work faster than the O(n) solution?

edit: Sorry! It seems I had carelessly copy-pasted the same code.


The question:

Given a list of numbers from 0 to N-1, your job is to find the missing number. Your program should take in a list of integers and return a single integer (the missing number).

i.e. an input of [0, 1, 3, 4, 5] should return 2.

I found two solutions, one I think in O(n), and one in (On^2).

O(n) Solution

def alt_find_missing(input_list):
    input_sum = sum(input_list)
    range_sum = sum([i for i in range(min(input_list), max(input_list)+1)])
    return range_sum - input_sum

O(n^2) Solution

def find_missing(input_list):
    input_set = set(input_list)
    for i in range(min(input_list), max(input_list)+1):
        if i not in input_set:
            return i

However, the O(n^2) solutions runs faster than the O(n) in multiple timeit tests:

List with 4 elements, using for ... in set (x 1 000 000)
1.1550223514080038
List with 4 elements, using difference of sums (x 1 000 000)
1.391524411772641
List with 100 elements, using for ... in set (x 1 000 000)
8.43574248785071
List with 100 elements, using difference of sums (x 1 000 000)
8.94695660741872
List with 1000 elements, using for ... in set (x 100 000)
8.1138781071155
List with 1000 elements, using difference of sums (x 100 000)
8.383110816298519

Why is this?

Upvotes: 1

Views: 1437

Answers (1)

Dennis Sakva
Dennis Sakva

Reputation: 1467

The second algorithm is not O(n^2). Set lookups are O(1) and iterating over a set is O(n). So the difference between two algorithms is due to different constant factors.

Here is a short script that shows linear behavior of both functions

import timeit
from functools import partial
from matplotlib import pyplot as plt

def alt_find_missing(input_list):
    input_sum = sum(input_list)
    range_sum = sum([i for i in range(min(input_list), max(input_list)+1)])
    return range_sum - input_sum


def find_missing(input_list):
    input_set = set(input_list)
    for i in range(min(input_list), max(input_list)+1):
        if i not in input_set:
            return i


t1=[]
t2=[]

rng=range(10000,1000000,10000)

for n in rng:
    func1=partial(find_missing,range(n))
    func2=partial(alt_find_missing,range(n))
    t1.append(timeit.timeit(func1,number=5))
    t2.append(timeit.timeit(func2,number=5)) 

plt.plot(rng,t1, label='Find missing')
plt.plot(rng,t2, label='Alt find missing')
plt.ylabel('Execution time')
plt.xlabel('Problem size')
plt.legend()
plt.show()

The result looks pretty linear to me :) Certainly there's something peculiar going on at some problem sizes, but nothing to throw results significantly out of linearity zone.

Regards. Two linear functions

Upvotes: 2

Related Questions