David
David

Reputation: 4818

qsort() vs std::sort, comparison function philosophical difference

I was wondering why there is two entirely different approaches to specify the comparison function in qsort() { C version } vs the std::sort().

qsort needs the comparison function like below: I don't know the reason why it needs three kinds of return values -1 , 0 , +1.

int comp(int *x, int *y){   
   return *x - *y; 
}

while the comparison function for std::sort(), which looks more consistent to me as it is written in terms of function to follows an invariant. i.e if x is smaller than y function return true, x is in the right position relative to y

bool comp(int x, int y){   
       return x < y; 
}

Why do we need three values -1,0,+1 when returning a bool (or int with two value 0 , 1) is much simpler and clean?

Upvotes: 5

Views: 2623

Answers (5)

Vic99999
Vic99999

Reputation: 51

There is an interesting discussion about this: https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/bgGrHG_Y_Pw

Upvotes: 2

Mooing Duck
Mooing Duck

Reputation: 66932

I'd wondered this same thing and came up with a theory based on strcmp and std::string::compare. Namely, doing a comparison can take a relatively large amount of time. To determine if an object A is less, equal, or greater than another object B, you can use:

if (A < B) {
    //do stuff where A is less
} else if (B < A) {
    //do stuff where A is greater
} else {
    //do stuff where A is equal
}

Which commonly requires the iteration of A and B to be done twice, once for A<B and once for B<A. If you check all three possibilities at the same time, you only have to iterate over the strings one time. Thus the -1, 0, 1 convention was used.

However, C++ seems to have dropped this convention. One argument I heard was that computers have changed, and due to the complexity of the code for doing the three-way comparison, doing one three-way comparison was slower and more error prone, and most of the time we didn't care about the equality case. In fact, all the standard sorting algorithms are designed to work like this, though an individual implementation might do something more interesting.

if (A < B) {
    //do stuff where A is less
} else {
    //do stuff where A is greater or equal
}

According to MSVC's timings on this test, string::operator< is nearly twice as fast as strcmp, and calling string::operator< twice was only slightly slower than doing it once. (Caching and simpler code I guess?). GCC's results are similar.

Upvotes: 2

Fred Foo
Fred Foo

Reputation: 363627

Others have pointed out the equivalence of the two ways of doing comparisons; here's why the two approaches are followed.

In C, the comparison needs to be a function pointer. That means you always get function call and pointer indirection overhead. When qsort was designed back in the 1970s on a PDP-11 computer, the amount of overhead had to be reduced so comparison functions such as strcmp did a three-way comparison in a single function call. (Note that qsort is generally an unstable sort, so the equality case may seem useless, but it can be made stable with the appropriate pointer comparisons.)

In C++, the comparison can be inlined at template instantiation time, so much of the overhead is gone (there need not even be a function call) and the simpler convention can be used. This also means that std::sort can work with operator< overloads by default.

Upvotes: 6

Jim Balter
Jim Balter

Reputation: 16406

The qsort comparison function is modeled after strcmp and memcmp, which return < 0, 0, or > 0 ... which is more information than just returning a < or >= indicator ... it takes two such calls to determine whether two elements are equal. The notion of "invariant" doesn't apply here: obviously the invariant that a[i] < a[i+1] does not apply to the original array ... in fact it doesn't apply to the final array because a[i] == a[i+1] is possible. Nor does the term "consistent" apply ... the results are and must be consistent for both types of comparison functions. "cleaner" is in the eye of the beholder, and "much simpler" is an overstatement.

Upvotes: 2

user2545918
user2545918

Reputation:

If for any two x and y, either x < y or x == y or x > y holds, then two way of giving the comparison function are equivalent. One may define == and > operators in terms of < as follows:

  • x == y is equivalent to !(x < y) && !(y < x)
  • x > y is equivalent to y < x

As you may realize, it is generally more efficient (and simpler) to implement <, <=, ==, >= and > in terms of the three-way compare operation, then implementing == in terms of < as shown above. I think that should be the reason why C (and many other languages actually) chose the three-way compare function, even though quicksort might not exploit that extra information.

C++ has operator overloading, but no three-way compare operator (neither has C), so shifting from three-way compare to the 'less' operator (despite the above mentioned potential disadvantages) allows to exploit operator overloading.

Upvotes: 2

Related Questions