Reputation: 880
I am having an issue with "umlauts" (letters ä, ü, ö, ...) and ifstream in C++.
I use curl to download an html page and ifstream to read in the downloaded file line by line and parse some data out of it. This goes well until I have a line like one of the following:
te="Olimpija Laibach - Tromsö";
te="Burghausen - Münster";
My code parses these lines and outputs it as the following:
Olimpija Laibach vs. Troms?
Burghausen vs. M?nster
Things like outputting umlauts directly from the code work:
cout << "öäü" << endl; // This works fine
My code looks somewhat like this:
ifstream fin("file");
while(!(fin.eof())) {
getline(fin, line, '\n');
int pos = line.find("te=");
if(pos >= 0) {
pos = line.find(" - ");
string team1 = line.substr(4,pos-4);
string team2 = line.substr(pos+3, line.length()-pos-6);
cout << team1 << " vs. " << team2 << endl;
}
}
Edit: The weird thing is that the same code (the only changed things are the source and the delimiters) works for another text input file (same procedure: download with curl, read with ifstream). Parsing and outputting a line like the following is no problem:
<span id="...">Fernwärme Vienna</span>
Upvotes: 3
Views: 2831
Reputation: 153977
What's the locale embedded in fin
? In the code you show, it would
be the global locale, which if you haven't reset it, is "C"
.
If you're anywhere outside the Anglo-Saxon world—and the strings
you show suggest that you are— one of the first things you do in
main
should be
std::locale::global( std::locale( "" ) );
This sets the global locale (and thus the default locale for any streams
opened later) to the locale being using in the surrounding environment.
(Formally, to an implementation defined native environment, but in
practice, to whatever the user is using.) In "C"
locale, the encoding
is almost always ASCII; ASCII doesn't recognize Umlauts, and according
to the standard, illegal encodings in input should be replaces with an
implementation defined character (IIRC—it's been some time since
I've actually reread this section). In output, of course, you're not
supposed to have any unknown characters, so the implementation doesn't
check for them, and the go through.
Since std::cin
, etc. are opened before you have a chance to set the
global locale, you'll have to imbue them with std::locale( "" )
specifically.
If this doesn't work, you might have to find some specific locale to use.
Upvotes: 2