pouzzler
pouzzler

Reputation: 1834

Impossible to put stdout in wide char mode

On my system, a pretty normal Ubuntu 13.10, the french accented characters "éèàçù..." are always handled correctly by whatever tools I use, despite LC_ environment variables being set to en_US.UTF-8. In particular command line utilities like grep, cat, ... always read and print these characters without a hitch.

Despite these remarks, such a small program as

int main() {
  printf("%c", getchar());
  return 0;
}

fails when the user enters "é".

From the man pages, and a lot of googling, there is no standard way to close stdout, then reopening it. From man fwide(), if stdout is in byte mode, I can't pass it to wide character mode, short of closing it and reopening it... therefore I can't use getwchar() and wprintf().

I can't believe that every single utility like cat, grep, etc... reimplements a way to manage wide characters, yet from my research, I see no other way.

Is it my system that has a problem? I can't see how since every utility works flawlessly. What am I missing, please?

Upvotes: 1

Views: 870

Answers (3)

caf
caf

Reputation: 239051

When a C program starts, stdout, stdin and stderr are neither byte nor wide-character oriented. fwide(stdin, 0) should return 0 at this point.

If you expand your minimal program to:

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

int main()
{
        setlocale(LC_ALL, "");
        printf("%lc\n", getwchar());
        return 0;
}

Then it should work as you expect. (There is no need to explicitly set the orientation of stdin here - since the first operation on it is a wide-character operation, it will have wide-character orientation).

You do need to use getwchar() instead of getchar() if you want to read a wide character with it, though.

Upvotes: 3

unwind
unwind

Reputation: 399863

The utilities you mention are generally line-oriented. If you were to try to read a whole line with e.g. fgets() rather than a single character, I think it'll work for you, too.

When you start reading single characters (which may be just bytes, and often are), you are of course very much susceptible to encoding issues.

Reading full lines will work just fine, as long as the line-termiation encoding is not mis-understood (and for UTF-8 it won't be).

Upvotes: 0

Amit Chauhan
Amit Chauhan

Reputation: 6879

UTF-8 character are taken as byte code not character and non ascii character are more then one byte. Check this Question

for more info

Upvotes: 0

Related Questions