isklenar
isklenar

Reputation: 994

Why is my heapsort faster than Javas and C++s sort functions?

I recently learned how to use heaps and the beauty of heapsort. I decided to compare heapsort with std::sort in C++ and Arrays.sort() in Java. I sorted an array of integers, each randomly generated in the range of <0; 2000000000)

I generated 100,000,000 integers into an array in Java, and ran Arrays.sort(), then generated new random sequence and ran my heapSort(). This is an output of my program in Java:

Arrays.sort time: 10.923 seconds.

Heap sort time: 1.402 seconds.

So heapsort is about 8 times faster.

I then ran similar code in C++, this time using std::vector as my container (due to the fact that std::sort needs two iterators).

C++ results:

Heapsort: 3.213

std::sort: 37.264

So in my program, std::sort is about 12 times slower.

In Java, I measured time using System.currentTimeMilis() and in C++ I used clock() from .

This was tested on Windows 7, Quad-Core Intel i5 2500k, overclocked to 4.8GHz. C++ was compiled with -Wall -pedantic flags.

Can anyone tell me what is going on? Is heapsort really that much faster? Or did I make an mistake in my code? I don't want to flood this post with lots of code, so I will link it at the end of this post.

BTW: Yes, I'm aware that Arrays.sort() is stable and heapsort isn't. Java doesn't have an unstable sort (atleast, I haven't found one). That is why I used std::sort in C++, to see if it had something to do with stability or not.

Source code, both C++ and Java: https://gist.github.com/anonymous/7475399

Upvotes: 0

Views: 688

Answers (3)

greatwolf
greatwolf

Reputation: 20838

There is one other issue in your C++ code that relates to how you're generating your random distribution:

int randomval()
{
  double d;
  int result;
  d = rand() / RAND_MAX;
  result = (int) (d * N);
  return result;
}

d is always going to be 0 because you are perform an int division and then implicitly converting it to double afterwards. In short, your randomval function isn't giving you any random values at all.

When you sort this with your own heap sort, the same code path is always executed. In your case, heapify will probably never execute this portion of code:

if (largest != i)
{
    int tmp = heap[i];
    heap[i] = heap[largest];
    heap[largest] = tmp;

    heapify(heap, largest, heapsize);
}

which is why your implementation appears to be faster.

Fixing the random test data with an actual distribution I think you'll find your implementation to be slower:

#include <random>
// snip...
int main()
{
  int length = 10000000;
  std::vector<int> vint1;

  std::default_random_engine gen;
  std::uniform_int_distribution<int> randomval(1, N);
  for (int i = 0; i < length; i++)
  {
        vint1.push_back(randomval(gen));
  }
  std::vector<int> vint2 = vint1; /* so we're sorting same testdata for both */
  // ...

Rerunning the benchmark again shows:

g++ -std=c++0x -Wall -pedantic -O2 heapsorttest.cpp -o heapsorttest.exe
heapsorttest.exe

Heapsort: 5.822s
true

std::sort: 0.936s
true

Upvotes: 1

Zac Howland
Zac Howland

Reputation: 15872

You do not properly swap items in either your Java (as john pointed out), nor your C++ code:

void heapSort(vector<int> & heap, int length)
{
    int heapsize = length;
    buildHeap(heap, heapsize);
    for(int i = heapsize-1; i >= 1; i--)
    {
        int tmp = heap[0];
        heap[i] = heap[0];
        heap[i] = tmp; // overwrote the item you just tried to swap!
        heapsize--;
        heapify(heap, 0, heapsize);
    }
}

In short, your code is "more efficient" because it doesn't do any sorting at all.

Upvotes: 2

john
john

Reputation: 87959

Your Java code looks bugged to me

int tmp = heap[0];
heap[i] = heap[0];
heap[i] = tmp;

That's not the code to swap two elements.

Does that make a difference to the execution time? I don't know heap sort well enough to be sure.

Upvotes: 7

Related Questions