Jakub Gruber
Jakub Gruber

Reputation: 775

Remove all non-alpha characters from string

Is there a way how to remove all non-alpha characters (i.e. ,.?! etc) from std::string while not deleting czech symbols like ščéř? I tried using:

std::string FileHandler::removePunctuation(std::string word) {
    for (std::string::iterator i = word.begin(); i != word.end(); i++) {
        if (!isalpha(word.at(i - word.begin()))) {
            word.erase(i);
            i--;
        }
    }
    return word;    
}

but it deletes czech characters.

In the best case, I'd like to use toLowerCase for those symbols too.

Upvotes: 5

Views: 4601

Answers (4)

Ralph Tandetzky
Ralph Tandetzky

Reputation: 23650

After calling std::setlocale(LC_ALL, "en_US.UTF-8") you can use std::iswalpha() to figure out, if something is a letter.

So the following program

#include <cwctype>
#include <iostream>
#include <string>

int main()
{
    std::setlocale(LC_ALL, "en_US.UTF-8");
    std::wstring youreWelcome = L"Není zač.";

    for ( auto c : youreWelcome )
        if ( std::iswalpha(c) )
            std::wcout << c;

    std::wcout << std::endl;
}

will print

Nenízač

to the console.

Note that std::setlocale() might not be thread-safe by itself nor in conjunction with certain other functions that are executed concurrently such as std::iswalpha(). Therefore, it should only be used in single-threaded code like program start-up code. More concretely, you should not call std::setlocale() from within FileHandler::removePunctuation() but only std::iswalpha() if you need it.

Upvotes: 0

PaulMcKenzie
PaulMcKenzie

Reputation: 35454

You can use std::remove_if along with erase:

#include <cctype>
#include <algorithm>
#include <string>
//...
std::wstring FileHandler::removePunctuation(std::wstring word) 
{
    word.erase(std::remove_if(word.begin(), word.end(), 
                  [](char ch){ return !::iswalnum(ch); }), word.end());
    return word;
}

Upvotes: 3

zdf
zdf

Reputation: 4808

Here's an idea:

#include <iostream>
#include <cwctype>
// if windows, add this: #include <io.h>
// if windows, add this: #include <fcntl.h>

int main()
{
  // if windows, add this: _setmode( _fileno( stdout ), _O_U16TEXT );
  std::wstring s( L"š1č2é3ř!?" );
  for ( auto c : s )
    if ( std::iswalpha( c ) )
      std::wcout << c;
  return 0;
}

Upvotes: 2

Cornelis de Mooij
Cornelis de Mooij

Reputation: 94

You might have to write a custom version of isalpha. From what you describe it seems like it only returns true for a-z and A-Z.

Upvotes: -1

Related Questions