Reputation: 109
Disclaimer: it's been about 5 years since I wrote any C/C++. I usually code in python.
I have a vector of strings that I wanted to sort in a certain way (by length), so I wrote my own comparator function:
bool sortSubstrs(string a, string b)
{
if (a.length() > b.length()) { return true; }
else { return false; }
}
I had a bug in it initially, so I put in a print statement (yes, I know I should be using an ide, but I was being lazy and just using vim):
bool sortSubstrs(string a, string b)
{
cout << a.length() + " " + b.length() << endl;
if (a.length() > b.length()) { return true; }
else { return false; }
}
I expected it to print the length of a
and b
, but instead it prints a bunch of seemingly "random" stuff from earlier in the stack (I'm guessing).
Here's a full, minimal example to reproduce:
sort_words.cpp:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
vector<string> to_vector(char* filename) // read from file and put into a vector
{
vector<string> karray;
FILE *ifile = fopen(filename, "r");
int ch;
int idx = 0;
string s1 = "";
while( EOF != (ch=getc(ifile)))
{
if ('\n' == ch)
{
karray.push_back(s1);
s1 = "";
} else {
string s2(1,ch);
s1 = s1 + s2;
}
}
fclose(ifile);
return karray;
}
bool sortSubstrs(string a, string b)
{
cout << a.length() + " " + b.length() << endl;
if (a.length() > b.length()) { return true;}
else { return false;}
}
void sort_words(char* filename){
vector<string> wordArray;
wordArray = to_vector(filename);
sort(wordArray.begin(), wordArray.end(), sortSubstrs);
wordArray.erase(unique(wordArray.begin(), wordArray.end()), wordArray.end());
//print out the sorted array
for (unsigned n=0; n < wordArray.size(); n++) {cout << wordArray.at(n) << "\n";}
}
int main(int argc, char *argv[])
{
if (FILE *ifile = fopen(argv[1], "r")){
fclose(ifile);
sort_words(argv[1]);
} else {
cout << "file doesn't exist" << endl;
return 1;
}
}
and an input.txt file:
sh
zsh
bash
Compiling with g++ (4.8.5 running on centos7) gives no errors:
g++ sort_words.cpp -o sort_words
and running ./sort_words input.txt
gives this output:
ile doesn't exist
e doesn't exist
bash
zsh
sh
I thought perhaps the check in main for the file existing/readable was somehow messing this up, so I rewrote it:
int main(int argc, char *argv[]) { sort_words(argv[1] }
but recompiling and running with the same input file gives this output instead:
ector::_M_insert_aux
tor::_M_insert_aux
bash
zsh
sh
Using a longer input file just prints out more of these "broken" strings in a similar pattern. So, what exactly is going on? My memory is a bit hazy, but I know that most sorting algorithms are recursive (and it looks like C++ uses a hybrid recursive sort method: Wikipedia: sort (C++)), but in any case the recursion occurs within the sort function and (I think) the sort function doesn't have access to the other functions in memory.
Does using a custom comparator significantly change how the sorting function works? My understanding was that inside the sort function, it would just call the custom function instead of the default > or < operators, but clearly something is a little different than I expected. Either that or I just have a bug somewhere that I've missed. I'm pretty stumped, unfortunately.
Upvotes: 0
Views: 419
Reputation: 75707
This has nothing to do with sort whatsoever. This line is wrong:
cout << a.length() + " " + b.length() << endl;
It should be:
cout << a.length() << " " << b.length() << endl;
As a relic and compatibility with C in C++ string literals have C array type, i.e. " "
is of type char[2]
. Again as a C relic C array types decay to pointers. So a.length() + " "
is just pointer arithmetic. Not what you want.
Out of curiosity, what is happening when I use the +?
pointer arithmetics. You move the pointer from the beginning of the " "
by
a.length()
elements. Getting a pointer 1 past beyond the original array results in Undefined Behavior.
Here's a better example to illustrate:
const char str[7] = "abcdef";
str + 2
// is equivalent with:
&str[0] + 2
// and is a pointer pointing to the 'c' letter from the array
str + 10
// is equivalent with:
&str[0] + 10
// and is a pointer pointing outside the array
// this is Undefined Behavior
Upvotes: 4