Reputation: 131
I initially thought up some sorting algorithm to code in C++ for practice. People told me it's very inefficient (indeed, sorting a few hundred numbers took ~10 seconds). The algorithm was to remember the first element ("pivot") in a vector, then parse through every other element, moving each element to the left of the pivot if it is smaller, or not do anything otherwise. This would split the list into to smaller lists to sort; the rest is done through recursion.
So now I know that dividing the list into two and doing recursions like this is essentially what quicksorting does (although there are a lot of variations on how to do the partitioning). I didn't understand why my original code was so inefficient, so I wrote up a new one. Someone had mentioned that it is because of the insert() and erase() functions, so I made sure to not use those, but instead used swap().
Old (slow):
void sort(vector<T>& vec){
int size = vec.size();
if (size <= 1){ //this is the most basic case
return;
}
T pivot = vec[0];
int index = 0; //to help split the list later
for (int i = 1; i < size; ++i){ //moving (or not moving) the elements
if (vec[i] < pivot){
vec.insert(vec.begin(), vec[i]);
vec.erase(vec.begin() + i + 1);
++index;
}
}
if (index == 0){ //in case the 0th element is the smallest
vec.erase(vec.begin());
sort(vec);
vec.insert(vec.begin(), pivot);
}
else if(index == size - 1){ //in case the 0th element is the largest
vec.pop_back();
sort(vec);
vec.push_back(pivot);
}
//here is the main recursive portion
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
//concatenating the sorted lists together
left.push_back(pivot);
left.insert(left.end(), right.begin(), right.end());
vec = left;
}
new (fast):
template <typename T>
void quickSort(vector<T>& vec, const int& left, const int& right){
if (left >= right){ //basic case
return;
}
T pivot = vec[left];
int j = left; //j will be the final index of the pivot before the next iteration
for (int i = left + 1; i <= right; ++i){
if (vec[i] < pivot){
swap(vec[i], vec[j]); //swapping the pivot and lesser element
++j;
swap(vec[i], vec[j]); //sending the pivot next to its original spot so it doesn't go the to right of any greater element
}
}
//recursion
quickSort(vec, left, j - 1);
quickSort(vec, j + 1, right);
}
The difference in performance is insane; the newer version can sort through tens of thousands of numbers in less than a second, while the first one can't do that with 100 numbers. What are erase() and insert() doing to slow it down, exactly? Is it really the erase() and insert() causing the bottleneck, or is there something else I am missing?
Upvotes: 1
Views: 138
Reputation: 416
First of all, yes, insert()
and erase()
will be much slower than swap()
.
insert()
will, in the best case, require every element after the spot where you're inserting into the vector to be moved to the next spot in the vector. Think about what happens if you shove yourself into the middle of a crowded line of people - everyone behind you will have to take one step back to make room for you. In the worst case, because inserting into the vector increases the vector's size, the vector may run out of space in its current memory location, leading to the entire vector (element by element) being copied into a new space where it has room to accommodate the newly inserted item. When an element in the middle of a vector is erase()
'd, every element after it must be copied and moved up one space; just like how everyone behind you in a line would take one step up if you left said line. In comparison, swap()
only moves the two elements being swapped.
In addition to that, I also noticed another major efficiency improvement between the two code samples:
In the first code sample, you have:
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
which uses the range constructor of C++ vectors. Every time the code reaches this point, when it creates left
and right
, it is traversing the entirety of vec
and copying each element one-by-one into the two new vectors.
In the newer, faster code, none of the elements are ever copied into a new vector; the entire algorithm takes place in the exact memory space in which the original numbers existed.
Upvotes: 7
Reputation: 1043
Vectors are arrays, so inserting and deleting elements in places other than the end position is done by relocate all the elements that were after position to their new positions.
Upvotes: 2