Eduard Rostomyan
Eduard Rostomyan

Reputation: 6546

O(NlogN) algorithm runs faster than O(n)... wait, what?

I am a bit confused, to be honest. I was working out the one of the classical algorithm problems. Given a collection of integers, find if are there 2 elements summing to a given number.

So I have implemented 2 solutions.

bool find1(std::vector<int>& V, int sum) 
{
    std::unordered_set<int> hashTable;
    for (int i = 0; i < V.size(); ++i) 
    {
        if (hashTable.find(V[i]) != hashTable.end()) 
        {
            return true;
        }
        hashTable.insert(sum - V[i]);
    }
    return false;
}

bool find2(std::vector<int>& V, int sum) 
{
    for (int i = 0; i < V.size() ; ++i) 
    {
        if (std::binary_search(V.begin(), V.end(), sum - V[i])) 
        {
            return true;
        }
    }
    return false;
}

Find1 is expected to be a linear algorithm (depending on the load of buckets and efficiency of hashing function).

Find2 is expected to be NlogN, we loop and do a binary search for every iteration.

After implementing this function I tried to test the running times of these algos on a relatively big collection, and the result confused me..

int main() 
{
    std::vector<int> V(10000,0);

    std::chrono::system_clock::time_point now1 = std::chrono::system_clock::now();

    for (int i = 0; i < 100; ++i) 
    {
        bool b = find1(V, 1000);
    }

    std::chrono::system_clock::time_point then1 = std::chrono::system_clock::now();
    std::cout <<"Linear with hashing = "<< std::chrono::duration_cast<std::chrono::microseconds>(then1 - now1).count()<<std::endl;

    std::chrono::system_clock::time_point now2 = std::chrono::system_clock::now();
    std::sort(V.begin(), V.end());
    for (int i = 0; i < 100; ++i)
    {
        bool b = find2(V, 1000);
    }

    std::chrono::system_clock::time_point then2 = std::chrono::system_clock::now();
    std::cout <<"NlogN with binary_search = " <<std::chrono::duration_cast<std::chrono::microseconds>(then2 - now2).count() << std::endl;

    system("pause");
}

Here I initialize the vector with 0s to be sure that both algos run the worst case.
The output of the program is:

Linear with hashing = 6759245         
NlogN with binary_search = 4508025

How this is possible? Can anyone please explain this to me?

Upvotes: 4

Views: 485

Answers (3)

Goswin von Brederlow
Goswin von Brederlow

Reputation: 12332

You create a hashtable without expected size. You then insert elements to it one by one. This causes the hashtable to be resized over and over causing a system call to allocate more memory.

While this is all amortized O(1) per insert the hidden constant for the system call is large enough to make the binary search faster.

Try setting an expected size for the hashtable to sizeof(V) * 1.2 or so to avoid rehashing. If that isn't enough compare the timings with 100000, 1000000, 10000000, ... values. You should see the hashtable winning as N gets larger.

Note: binary search with V.end() == 0 will terminate on the first compare and is not worst case. It's best case. Might be even more reason why it's faster.

Upvotes: 7

eerorika
eerorika

Reputation: 238381

Just because the upper bound of the asymptotical complexity of one algorithm is less than another, doesn't mean that it's faster for any arbitrary input. It just means that there exists a certain size of input N', beyond which the less complex algorithm will be faster. This size will be specific to each particular system running the program.

Measuring the asymptotically more complex algorithm to be faster simply means that your test was below the size N'. However, that assumes that your complexity analysis applies to the program in the first place. For example, analyzing the worst case complexity is wrong if your program tests the algorithm with best-case input and vice versa.

For what it's worth, the results of on my system are:

Linear with hashing = 9557
NlogN with binary_search = 15828

Upvotes: 8

user1196549
user1196549

Reputation:

O(N) is asymptotically faster than O(N Log N). That doesn't mean that it is faster.

Review the definition of the Big-O notation.

Upvotes: 1

Related Questions