Reputation: 54133
I'm currently sorting by the std::string < operator. The problem with it is that:
30 < 9. The 30 shows up before the 9 since 3 < 9, Windows 9x had this issue to. How could I go about sorting them numerically so that "30 Foxes" shows up after "9 dogs". I should also add that I'm using utf 8 encoding.
Thanks
Upvotes: 12
Views: 27239
Reputation: 86
This what worked for me (assuming no leading zeroes), i.e. the idea is that phonetic compare can be applied just to numbers with same number of digits.
auto numeric_str_cmp = [](const std::string& a, const std::string& b) -> bool {
return (a.size() < b.size()) || (a.size() == b.size() && a < b);
};
std::sort(numeric_strings.begin(), numeric_strings.end(), numeric_str_cmp);
Upvotes: 2
Reputation: 2185
Here's a version that doesn't convert to integer and thus works for long strings of digits regardless of sizeof(int).
#include <cctype>
#include <cstddef>
#include <cstring>
#include <string>
int numcmp(const char *a, const char *aend, const char *b, const char *bend)
{
for (;;) {
if (a == aend) {
if (b == bend)
return 0;
return -1;
}
if (b == bend)
return 1;
if (*a == *b) {
++a, ++b;
continue;
}
if (!isdigit((unsigned char) *a) || !isdigit((unsigned char) *b))
return *a - *b;
// skip leading zeros in both strings
while (*a == '0' && ++a != aend)
;
while (*b == '0' && ++b != aend)
;
// skip to end of the consecutive digits
const char *aa = a;
while (a != aend && isdigit((unsigned char) *a))
++a;
std::ptrdiff_t alen = a - aa;
const char *bb = b;
while (b != bend && isdigit((unsigned char) *b))
++b;
std::ptrdiff_t blen = b - bb;
if (alen != blen)
return alen - blen;
// same number of consecutive digits in both strings
while (aa != a) {
if (*aa != *bb)
return *aa - *bb;
++aa, ++bb;
}
}
}
int numcmp(const std::string& a, const std::string& b)
{
return numcmp(a.data(), a.data() + a.size(),
b.data(), b.data() + b.size());
}
int numcmp(const char *a, const char *b)
{
return numcmp(a, a + strlen(a), b, b + strlen(b));
}
Upvotes: 3
Reputation: 1398
If you are targeting Windows (XP+) and can afford to convert your strings to utf-16, you can use the StrCmpLogicalW
function from Shlwapi. See msdn for details.
Otherwise, ICU provides this functionality in its collators. See UCOL_NUMERIC_COLLATION
.
Upvotes: 3
Reputation: 53319
You can create a custom comparison function to use with std::sort
. This function would have to check if the string begins with a numeric value. If it does, convert the numeric part of each string to an int
using some mechanism like a stringstream. Then compare the two integer values. If the values compare equally, compare the non-numeric part of the strings lexicographically. Otherwise, if the strings don't contain a numeric part, simply compare the two strings lexicographically as normal.
Basically, something like the following (untested) comparison function:
bool is_not_digit(char c)
{
return !std::isdigit(c);
}
bool numeric_string_compare(const std::string& s1, const std::string& s2)
{
// handle empty strings...
std::string::const_iterator it1 = s1.begin(), it2 = s2.begin();
if (std::isdigit(s1[0]) && std::isdigit(s2[0])) {
int n1, n2;
std::stringstream ss(s1);
ss >> n1;
ss.clear();
ss.str(s2);
ss >> n2;
if (n1 != n2) return n1 < n2;
it1 = std::find_if(s1.begin(), s1.end(), is_not_digit);
it2 = std::find_if(s2.begin(), s2.end(), is_not_digit);
}
return std::lexicographical_compare(it1, s1.end(), it2, s2.end());
}
And then...
std::sort(string_array.begin(), string_array.end(), numeric_string_compare);
EDIT: Of course, this algorithm is only useful if you're sorting strings where the numeric portion appears at the beginning of the string. If you're dealing with strings where the numeric portion can appear anywhere in the string, then you need a more sophisticated algorithm. See http://www.davekoelle.com/alphanum.html for more information.
Upvotes: 13