Pierre Collard Suero
Pierre Collard Suero

Reputation: 73

How can I read accented characters in C++ and use them with isalnum?

I am programming in French and, because of that, I need to use accented characters. I can output them by using #include <locale> and setlocale(LC_ALL, ""), but there seems to be a problem when I read accented characters. Here is simple example I made to show the problem :

#include <locale>
#include <iostream>

using namespace std;

const string SymbolsAllowed = "+-*/%";

int main()
{
    setlocale(LC_ALL, "");    // makes accents printable

    // Traduction : Please write a string with accented characters
    // 'é' is shown correctly :
    cout << "Veuillez écrire du texte accentué : ";

    string accentedString;
    getline(cin, accentedString);

    // Accented char are not shown correctly :
    cout << "Accented string written : " << accentedString << endl;

    for (unsigned int i = 0; i < accentedString.length(); ++i)
    {
        char currentChar = accentedString.at(i);

        // The program crashes while testing if currentChar is alphanumeric.
        // (error image below) :
        if (!isalnum(currentChar) && !strchr(SymbolsAllowed.c_str(), currentChar))
        {
            cout << endl << "Character not allowed : " << currentChar << endl;
            system("pause");
            return 1;
        }
    }

    cout << endl << "No unauthorized characters were written." << endl;

    system("pause");
    return 0;
}

Here is an output example before the program crashes :

Veuillez écrire du texte accentué : éèàìù
Accented string written : ʾS.?—

I noticed the debugger from Visual Studio shows that I have written something different than what it outputs :

[0] -126 '‚'    char
[1] -118 'Š'    char
[2] -123 '…'    char
[3] -115 ''     char
[4] -105 '—'    char

The error shown seems to tell that only characters between -1 and 255 can be used but, according to the ASCII table the value of the accented characters I used in the example above do not exceed this limit.

Here is a picture of the error dialog that pops up : Error message: Expression: c >= -1 && c <= 255

Can someone please tell me what I am doing wrong or give me a solution for this? Thank you in advance. :)

Upvotes: 3

Views: 5352

Answers (2)

rici
rici

Reputation: 241921

  1. char is a signed type on your system (indeed, on many systems) so its range of values is -128 to 127. Characters whose codes are between 128 and 255 look like negative numbers if they are stored in a char, and that is actually what your debugger is telling you:

    [0] -126 '‚'    char
    

    That's -126, not 126. In other words, 130 or 0x8C.

  2. isalnum and friends take an int as an argument, which (as the error message indicates) is constrained to the values EOF (-1 on your system) and the range 0-255. -126 is not in this range. Hence the error. You could cast to unsigned char, or (probably better, if it works on Windows), use the two-argument std::isalnum in <locale>

  3. For reasons which totally escape me, Windows seems to be providing console input in CP-437 but processing output in CP-1252. The high half of those two code pages is completely different. So when you type é, it gets sent to your program as 130 (0xC2) from CP-437, but when you send that same character back to the console, it gets printed according to CP-1252 as an (low) open single quote (which looks a lot like a comma, but isn't). So that's not going to work. You need to get input and output to be on the same code page.

  4. I don't know a lot about Windows, but you can probably find some useful information in the MS docs. That page includes links to Windows-specific functions which set the input and output code pages.

  5. Intriguingly, the accented characters in the source code of your program appear to be CP-1252, since they print correctly. If you decide to move away from code page 1252 -- for example, by adopting Unicode -- you'll have to fix your source code as well.

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490653

With the is* and to* functions, you really need to cast the input to unsigned char before passing it to the function:

if (!isalnum((unsigned char)currentChar) && !strchr(SymbolsAllowed.c_str(), currentChar)) {

While you're at it, I'd advise against using strchr as well, and switch to something like this:

std::string SymbolsAllowed = "+-*/%";

if (... && SymbolsAllowed.find(currentChar) == std::string::npos)

While you're at it, you should probably forget that you ever even heard of the exit function. You should never use it in C++. In the case here (exiting from main) you should just return. Otherwise, throw an exception (and if you want to exit the program, catch the exception in main and return from there).

If I were writing this, I'd do the job somewhat differently in general though. std::string already has a function to do most of what your loop is trying to accomplish, so I'd set up symbolsAllowed to include all the symbols you want to allow, then just do a search for anything it doesn't contain:

// Add all the authorized characters to the string:
for (unsigned char a = 0; a < std::numeric_limits<unsigned char>::max(); a++)
    if (isalnum(a) || isspace(a)) // you probably want to allow spaces?
        symbolsAllowed += a;

// ...

auto pos = accentedString.find_first_not_of(symbolsAllowed);
if (pos != std::string::npos) {
    std::cout << "Character not allowed: " << accentedString[pos];
    return 1;
}

Upvotes: 1

Related Questions