Reputation: 775
Is there a way how to remove all non-alpha characters (i.e. ,.?!
etc) from std::string
while not deleting czech symbols like ščéř
? I tried using:
std::string FileHandler::removePunctuation(std::string word) {
for (std::string::iterator i = word.begin(); i != word.end(); i++) {
if (!isalpha(word.at(i - word.begin()))) {
word.erase(i);
i--;
}
}
return word;
}
but it deletes czech characters.
In the best case, I'd like to use toLowerCase
for those symbols too.
Upvotes: 5
Views: 4601
Reputation: 23650
After calling std::setlocale(LC_ALL, "en_US.UTF-8")
you can use std::iswalpha()
to figure out, if something is a letter.
So the following program
#include <cwctype>
#include <iostream>
#include <string>
int main()
{
std::setlocale(LC_ALL, "en_US.UTF-8");
std::wstring youreWelcome = L"Není zač.";
for ( auto c : youreWelcome )
if ( std::iswalpha(c) )
std::wcout << c;
std::wcout << std::endl;
}
will print
Nenízač
to the console.
Note that std::setlocale()
might not be thread-safe by itself nor in conjunction with certain other functions that are executed concurrently such as std::iswalpha()
. Therefore, it should only be used in single-threaded code like program start-up code. More concretely, you should not call std::setlocale()
from within FileHandler::removePunctuation()
but only std::iswalpha()
if you need it.
Upvotes: 0
Reputation: 35454
You can use std::remove_if
along with erase
:
#include <cctype>
#include <algorithm>
#include <string>
//...
std::wstring FileHandler::removePunctuation(std::wstring word)
{
word.erase(std::remove_if(word.begin(), word.end(),
[](char ch){ return !::iswalnum(ch); }), word.end());
return word;
}
Upvotes: 3
Reputation: 4808
Here's an idea:
#include <iostream>
#include <cwctype>
// if windows, add this: #include <io.h>
// if windows, add this: #include <fcntl.h>
int main()
{
// if windows, add this: _setmode( _fileno( stdout ), _O_U16TEXT );
std::wstring s( L"š1č2é3ř!?" );
for ( auto c : s )
if ( std::iswalpha( c ) )
std::wcout << c;
return 0;
}
Upvotes: 2
Reputation: 94
You might have to write a custom version of isalpha. From what you describe it seems like it only returns true for a-z and A-Z.
Upvotes: -1