asdf
asdf

Reputation: 109

How does the C++ sort function work when using a custom comparator?

Disclaimer: it's been about 5 years since I wrote any C/C++. I usually code in python.

I have a vector of strings that I wanted to sort in a certain way (by length), so I wrote my own comparator function:

bool sortSubstrs(string a, string b)
{
    if (a.length() > b.length()) { return true; }
    else { return false; }
}

I had a bug in it initially, so I put in a print statement (yes, I know I should be using an ide, but I was being lazy and just using vim):

bool sortSubstrs(string a, string b)
{
    cout << a.length() + " " + b.length() << endl;
    if (a.length() > b.length()) { return true; }
    else { return false; }
}

I expected it to print the length of a and b, but instead it prints a bunch of seemingly "random" stuff from earlier in the stack (I'm guessing).

Here's a full, minimal example to reproduce:

sort_words.cpp:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>

using namespace std;

vector<string> to_vector(char* filename) // read from file and put into a vector
{
    vector<string> karray;
    FILE *ifile = fopen(filename, "r");
    int ch;
    int idx = 0;
    string s1 = "";

    while( EOF != (ch=getc(ifile)))
    {
        if ('\n' == ch)
        {
            karray.push_back(s1);
            s1 = "";
        } else {
            string s2(1,ch);
            s1 = s1 + s2;
        }
    }
    fclose(ifile);
    return karray;
}

bool sortSubstrs(string a, string b)
{
    cout << a.length() + " " + b.length() << endl;
    if (a.length() > b.length()) { return true;}
    else { return false;}
}

void sort_words(char* filename){
    vector<string> wordArray;
    wordArray = to_vector(filename);

    sort(wordArray.begin(), wordArray.end(), sortSubstrs);
    wordArray.erase(unique(wordArray.begin(), wordArray.end()), wordArray.end());

    //print out the sorted array
    for (unsigned n=0; n < wordArray.size(); n++) {cout << wordArray.at(n) << "\n";}
}

int main(int argc, char *argv[])
{
    if (FILE *ifile = fopen(argv[1], "r")){
        fclose(ifile);
        sort_words(argv[1]);
    } else {
        cout << "file doesn't exist" << endl;
        return 1;
    }
}

and an input.txt file:

sh
zsh
bash

Compiling with g++ (4.8.5 running on centos7) gives no errors: g++ sort_words.cpp -o sort_words

and running ./sort_words input.txt gives this output:

ile doesn't exist
e doesn't exist
bash
zsh
sh

I thought perhaps the check in main for the file existing/readable was somehow messing this up, so I rewrote it:

int main(int argc, char *argv[]) { sort_words(argv[1] }

but recompiling and running with the same input file gives this output instead:

ector::_M_insert_aux
tor::_M_insert_aux
bash
zsh
sh

Using a longer input file just prints out more of these "broken" strings in a similar pattern. So, what exactly is going on? My memory is a bit hazy, but I know that most sorting algorithms are recursive (and it looks like C++ uses a hybrid recursive sort method: Wikipedia: sort (C++)), but in any case the recursion occurs within the sort function and (I think) the sort function doesn't have access to the other functions in memory.

Does using a custom comparator significantly change how the sorting function works? My understanding was that inside the sort function, it would just call the custom function instead of the default > or < operators, but clearly something is a little different than I expected. Either that or I just have a bug somewhere that I've missed. I'm pretty stumped, unfortunately.

Upvotes: 0

Views: 419

Answers (1)

bolov
bolov

Reputation: 75707

This has nothing to do with sort whatsoever. This line is wrong:

cout << a.length() + " " + b.length() << endl;

It should be:

 cout << a.length() << " " << b.length() << endl;

As a relic and compatibility with C in C++ string literals have C array type, i.e. " " is of type char[2]. Again as a C relic C array types decay to pointers. So a.length() + " " is just pointer arithmetic. Not what you want.

Out of curiosity, what is happening when I use the +?

pointer arithmetics. You move the pointer from the beginning of the " " by a.length() elements. Getting a pointer 1 past beyond the original array results in Undefined Behavior.

Here's a better example to illustrate:

const char str[7] = "abcdef";

str + 2
// is equivalent with:
&str[0] + 2
// and is a pointer pointing to the 'c' letter from the array

str + 10
// is equivalent with:
&str[0] + 10
// and is a pointer pointing outside the array
// this is Undefined Behavior

Upvotes: 4

Related Questions