Reputation: 8185
Considering the method:
void Capitalize(std::string &s)
{
bool shouldCapitalize = true;
for(size_t i = 0; i < s.size(); i++)
{
if (iswalpha(s[i]) && shouldCapitalize == true)
{
s[i] = (char)towupper(s[i]);
shouldCapitalize = false;
}
else if (iswspace(s[i]))
{
shouldCapitalize = true;
}
}
}
It works perfectly for ASCII characters, e.g.
"steve" -> "Steve"
However, once I'm using a non-latin characters, e.g. as with Cyrillic alphabet, I'm not getting that result:
"стив" -> "стив"
What is the reason why that method fails for non-latin alphabets? I've tried using methods such as isalpha
as well as iswalpha
but I'm getting exactly the same result.
What would be a way to modify this method to capitalize non-latin alphabets?
Note: Unfortunately, I'd prefer to solve this issue without using a third party library such as icu4c, otherwise it would have been a very simple problem to solve.
Update:
This solution doesn't work (for some reason):
void Capitalize(std::string &s)
{
bool shouldCapitalize = true;
std::locale loc("ru_RU"); // Creating a locale that supports cyrillic alphabet
for(size_t i = 0; i < s.size(); i++)
{
if (isalpha(s[i], loc) && shouldCapitalize == true)
{
s[i] = (char)toupper(s[i], loc);
shouldCapitalize = false;
}
else if (isspace(s[i], loc))
{
shouldCapitalize = true;
}
}
}
Upvotes: 2
Views: 138
Reputation: 2859
std::locale
works, at least where it is present in system. Also you use it incorrectly.
This code works as expected on Ubuntu with Russian locale installed:
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
void Capitalize(std::wstring &s)
{
bool shouldCapitalize = true;
std::locale loc("ru_RU.UTF-8"); // Creating a locale that supports cyrillic alphabet
for(size_t i = 0; i < s.size(); i++)
{
if (isalpha(s[i], loc) && shouldCapitalize == true)
{
s[i] = toupper(s[i], loc);
shouldCapitalize = false;
}
else if (isspace(s[i], loc))
{
shouldCapitalize = true;
}
}
}
int main()
{
std::wstring in = L"это пример текста";
Capitalize(in);
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv1;
std::string out = conv1.to_bytes(in);
std::cout << out << "\n";
return 0;
}
Its possible that on Windows you need to use other locale name, I'm not sure.
Upvotes: 2
Reputation: 694
Well, an external library would be the only practical choice IMHO. The standard functions works well with Latin, and any other locale would be a pain, and I wouldn't bother. Still, if you want support for Latin and Cyrillic without an external library, you can just write it yourself:
wchar_t to_upper(wchar_t c) {
// Latin
if (c >= L'a' && c <= L'z') return c - L'a' + L'A';
// Cyrillic
if (c >= L'а' && c <= L'я') return c - L'а' + L'А';
return towupper(c);
}
Still, it's important to note that you need to painstakingly implement support for all alphabets, and even not all latin characters are supported, so an external library is the best solution. Consider the given solution if you're sure only English and Russian are going to be used.
Upvotes: 0