Why do these two variations on the "quick sorting" algorithm differ so much in performance?

Question

I initially thought up some sorting algorithm to code in C++ for practice. People told me it's very inefficient (indeed, sorting a few hundred numbers took ~10 seconds). The algorithm was to remember the first element ("pivot") in a vector, then parse through every other element, moving each element to the left of the pivot if it is smaller, or not do anything otherwise. This would split the list into to smaller lists to sort; the rest is done through recursion.

So now I know that dividing the list into two and doing recursions like this is essentially what quicksorting does (although there are a lot of variations on how to do the partitioning). I didn't understand why my original code was so inefficient, so I wrote up a new one. Someone had mentioned that it is because of the insert() and erase() functions, so I made sure to not use those, but instead used swap().

Old (slow):

void sort(vector& vec){
  int size = vec.size();
  if (size <= 1){ //this is the most basic case
    return;
  }

  T pivot = vec[0];
  int index = 0; //to help split the list later
  for (int i = 1; i < size; ++i){ //moving (or not moving) the elements
    if (vec[i] < pivot){
      vec.insert(vec.begin(), vec[i]);
      vec.erase(vec.begin() + i + 1);
      ++index;
    }
  }

  if (index == 0){ //in case the 0th element is the smallest
    vec.erase(vec.begin());
    sort(vec);
    vec.insert(vec.begin(), pivot);
  }
  else if(index == size - 1){ //in case the 0th element is the largest
    vec.pop_back();
    sort(vec);
    vec.push_back(pivot);
  }

  //here is the main recursive portion
  vector left = vector(vec.begin(), vec.begin() + index);
  sort(left);
  vector right = vector(vec.begin() + index + 1, vec.end());
  sort(right);

  //concatenating the sorted lists together
  left.push_back(pivot);
  left.insert(left.end(), right.begin(), right.end());

  vec = left;
}

new (fast):

template 
void quickSort(vector& vec, const int& left, const int& right){
  if (left >= right){ //basic case
    return;
  }
  T pivot = vec[left];
  int j = left; //j will be the final index of the pivot before the next iteration

  for (int i = left + 1; i <= right; ++i){
    if (vec[i] < pivot){
      swap(vec[i], vec[j]); //swapping the pivot and lesser element
      ++j;
      swap(vec[i], vec[j]); //sending the pivot next to its original spot so it doesn't go the to right of any greater element
    }
  }

  //recursion
  quickSort(vec, left, j - 1);
  quickSort(vec, j + 1, right);
}

The difference in performance is insane; the newer version can sort through tens of thousands of numbers in less than a second, while the first one can't do that with 100 numbers. What are erase() and insert() doing to slow it down, exactly? Is it really the erase() and insert() causing the bottleneck, or is there something else I am missing?

thisisbenmanley · Accepted Answer

First of all, yes, insert() and erase() will be much slower than swap().

insert() will, in the best case, require every element after the spot where you're inserting into the vector to be moved to the next spot in the vector. Think about what happens if you shove yourself into the middle of a crowded line of people - everyone behind you will have to take one step back to make room for you. In the worst case, because inserting into the vector increases the vector's size, the vector may run out of space in its current memory location, leading to the entire vector (element by element) being copied into a new space where it has room to accommodate the newly inserted item. When an element in the middle of a vector is erase()'d, every element after it must be copied and moved up one space; just like how everyone behind you in a line would take one step up if you left said line. In comparison, swap() only moves the two elements being swapped.

In addition to that, I also noticed another major efficiency improvement between the two code samples:

In the first code sample, you have:

vector left = vector(vec.begin(), vec.begin() + index);
sort(left);
vector right = vector(vec.begin() + index + 1, vec.end());
sort(right);

which uses the range constructor of C++ vectors. Every time the code reaches this point, when it creates left and right, it is traversing the entirety of vec and copying each element one-by-one into the two new vectors.

In the newer, faster code, none of the elements are ever copied into a new vector; the entire algorithm takes place in the exact memory space in which the original numbers existed.

Why do these two variations on the "quick sorting" algorithm differ so much in performance?

Answers (2)

Related Questions

Why do these two variations on the &quot;quick sorting&quot; algorithm differ so much in performance?

Answers (2)

Related Questions

Why do these two variations on the "quick sorting" algorithm differ so much in performance?