Reputation: 185
Basically, I have to use selection sort to sort a string[]
. I have done this part but this is what I am having difficulty with.
The sort, however, should be case-insensitive, so that "antenna" would come before "Jupiter". ASCII sorts from uppercase to lowercase, so would there not be a way to just swap the order of the sorted string? Or is there a simpler solution?
void stringSort(string array[], int size) {
int startScan, minIndex;
string minValue;
for(startScan = 0 ; startScan < (size - 1); startScan++) {
minIndex = startScan;
minValue = array[startScan];
for (int index = startScan + 1; index < size; index++) {
if (array[index] < minValue) {
minValue = array[index];
minIndex = index;
}
}
array[minIndex] = array[startScan];
array[startScan] = minValue;
}
}
Upvotes: 9
Views: 12922
Reputation: 364160
Instead of the <
operator, use a case-insensitive string comparison function.
C89/C99 provide strcoll
(string collate), which does a locale-aware string comparison. It's available in C++ as std::strcoll. In some (most?) locales, like en_CA.UTF-8, A
and a
(and all accented forms of either) are in the same equivalence class. I think strcoll only compares within an equivalence class as a tiebreak if the whole string is otherwise equal, which gives a very similar sort order to a case-insensitive compare. Collation (at least in English locales on GNU/Linux) ignores some characters (like [
). So ls /usr/share | sort
gives output like
acpi-support
adduser
ADM_scripts
aglfn
aisleriot
I pipe through sort
because ls
does its own sorting, which isn't quite the same as sort
's locale-based sorting.
If you want to sort some user-input arbitrary strings into an order that the user will see directly, locale-aware string comparison is usually what you want. Strings that differ only in case or accents won't compare equal, so this won't work if you were using a stable sort and depending on case-differing strings to compare equal, but otherwise you get nice results. Depending on the use-case, nicer than plain case-insensitive comparison.
FreeBSD's strcoll was and maybe still is case sensitive for locales other than POSIX (ASCII). That forum post suggests that on most other systems it is not case senstive.
MSVC provides a _stricoll
for case-insensitive collation, implying that its normal strcoll
is case sensitive. However, this might just mean that the fallback to comparing within an equivalence class doesn't happen. Maybe someone can test the following example with MSVC.
// strcoll.c: show that these strings sort in a different order, depending on locale
#include <stdio.h>
#include <locale.h>
int main()
{
// TODO: try some strings containing characters like '[' that strcoll ignores completely.
const char * s[] = { "FooBar - abc", "Foobar - bcd", "FooBar - cde" };
#ifdef USE_LOCALE
setlocale(LC_ALL, ""); // empty string means look at env vars
#endif
strcoll(s[0], s[1]);
strcoll(s[0], s[2]);
strcoll(s[1], s[2]);
return 0;
}
output of gcc -DUSE_LOCALE -Og strcoll.c && ltrace ./a.out
(or run LANG=C ltrace a.out):
__libc_start_main(0x400586, 1, ...
setlocale(LC_ALL, "") = "en_CA.UTF-8" # my env contains LANG=en_CA.UTF-8
strcoll("FooBar - abc", "Foobar - bcd") = -1
strcoll("FooBar - abc", "FooBar - cde") = -2
strcoll("Foobar - bcd", "FooBar - cde") = -1
# the three strings are in order
+++ exited (status 0) +++
with gcc -Og -UUSE_LOCALE strcoll.c && ltrace ./a.out
:
__libc_start_main(0x400536, ...
# no setlocale, so current locale is C
strcoll("FooBar - abc", "Foobar - bcd") = -32
strcoll("FooBar - abc", "FooBar - cde") = -2
strcoll("Foobar - bcd", "FooBar - cde") = 32 # s[1] should sort after s[2], so it's out of order
+++ exited (status 0) +++
POSIX.1-2001 provides strcasecmp
. The POSIX spec says the results are "unspecified" for locales other than plain-ASCII, though, so I'm not sure whether common implementations handle utf-8 correctly or not.
See this post for portability issues with strcasecmp, e.g. to Windows. See other answers on that question for other C++ ways of doing case-insensitive string compares.
Once you have a case-insensitive comparison function, you can use it with other sort algorithms, like C standard lib qsort
, or c++ std::sort, instead of writing your own O(n^2) selection-sort.
As b.buchhold's answer points out, doing a case-insensitive comparison on the fly might be slower than converting everything to lowercase once, and sorting an array of indices. The lowercase-version of each strings is needed many times. std::strxfrm will transform a string so that strcmp
on the result will give the same result as strcoll
on the original string.
Upvotes: 1
Reputation: 38919
C++ provides you with sort
which takes a comparison function. In your case with a vector<string>
you'll be comparing two strings. The comparison function should return true
if the first argument is smaller.
For our comparison function we'll want to find the first mismatched character between the string
s after tolower
has been applied. To do this we can use mismatch
which takes a comparator between two characters returning true
as long as they are equal:
const auto result = mismatch(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend(), [](const unsigned char lhs, const unsigned char rhs){return tolower(lhs) == tolower(rhs);});
To decide if the lhs
is smaller than the rhs
fed to mismatch
we need to test 3 things:
string
s of unequal lengthstring lhs
shorterchar
from lhs
smaller than the first mismatched char
from rhs
This evaluation can be performed by:
result.second != rhs.cend() && (result.first == lhs.cend() || tolower(*result.first) < tolower(*result.second));
Ultimately, we'll want to wrap this up in a lambda and plug it back into sort
as our comparator:
sort(foo.begin(), foo.end(), [](const unsigned char lhs, const unsigned char rhs){
const auto result = mismatch(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend(), [](const unsigned char lhs, const unsigned char rhs){return tolower(lhs) == tolower(rhs);});
return result.second != rhs.cend() && (result.first == lhs.cend() || tolower(*result.first) < tolower(*result.second));
});
This will correctly sort vector<string> foo
. You can see a live example here: http://ideone.com/BVgyD2
EDIT:
Just saw your question update. You can use sort
with string array[]
as well. You'll just need to call it like this: sort(array, std::next(array, size),
...
Upvotes: 9
Reputation: 163
I use this lambda function to sort a vectors of strings:
std::sort(entries.begin(), entries.end(), [](const std::string& a, const std::string& b) -> bool {
for (size_t c = 0; c < a.size() and c < b.size(); c++) {
if (std::tolower(a[c]) != std::tolower(b[c]))
return (std::tolower(a[c]) < std::tolower(b[c]));
}
return a.size() < b.size();
});
Upvotes: 2
Reputation: 31
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
void CaseInsensitiveSort(vector<string>& strs)
{
sort(
begin(strs),
end(strs),
[](const string& str1, const string& str2){
return lexicographical_compare(
begin(str1), end(str1),
begin(str2), end(str2),
[](const char& char1, const char& char2) {
return tolower(char1) < tolower(char2);
}
);
}
);
}
Upvotes: 3
Reputation: 44248
This solution is much simpler to understand than Jonathan Mee's and pretty inefficient, but for educational purpose could be fine:
std::string lowercase( std::string s )
{
std::transform( s.begin(), s.end(), s.begin(), ::tolower );
return s;
}
std::sort( array, array + length,
[]( const std::string &s1, const std::string &s2 ) {
return lowercase( s1 ) < lowercase( s2 );
} );
if you have to use your sort function, you can use the same approach:
....
minValue = lowercase( array[startScan] );
for (int index = startScan + 1; index < size; index++) {
const std::string &tstr = lowercase( array[index] );
if (tstr < minValue) {
minValue = tstr;
minIndex = index;
}
}
...
Upvotes: 0
Reputation: 3896
You could call tolower
on every character you compare. This is probably the easiest, yet not a great solution, becasue:
You could also replace the comparator by your own function. I.e. there will be some place where you compare something like stringone[i] < stringtwo[j]
or charA < charB
. change it to my_less_than(stringone[i], stringtwo[j])
and implement the exact ordering you want based.
another way would be to transform every string to lowercase once and create an array of pairs. then you base your comparisons on the lowercase value only, but you swap whole pairs so that your final strings will be in the right order as well.
finally, you can create an array with lowercase versions and sort this one. whenever you swap two elements in this one, you also swap in the original array.
note that all those proposals would still need proper handling of wide characters (if you need that at all)
Upvotes: 0