C - Strange behavior with fprintf

Question

I have a project where the program saves some messages into a log file. At the start of my program, I open this file and functions write the text in it. As I'm brazilian, we have special caractares like ç, ã and others.

The problem I'm having is that in one of the files, I have the following line:

fprintf(logfile, "
Arquivo "Instruções.txt" criado
");

and it works perfectly. The log file is correct.

Another file, of the same project has the following line:

fprintf(logfile, "Carregando configurações
");

and the logfile is incorrect with 'Ã§Ãµ' instead of 'çõ'.

It's important to notice that I didn't close the file and reopened in the meantime, it's the same from beginning to the end and the second line I've shown is the first command to write in the file.

Another point is that if I copy that first line to the file of the second line, I get the same strange symbols, and the same problem if doing the opposite.

Now I don't have any idea what's going on. I've tried using %c and the number corresponding the characters (I use this method when using the console), but get the same problem.

Thanks very much!

EDIT: I don't know if it was clear, but at the same log file, there's both correct and incorrect letters. It's juts different files of the project that have the commands.

John Chadwick · Accepted Answer

It looks like the log file is being interpreted as some old encoding. Nowadays, most things are written in UNICODE. In this case, C does not care what format your string is, but since most things are UTF-8 these days, it's unlikely that you ended up with any other kind of string.

In this case, we can tell, because UTF-8 escape sequences look like this when not properly interpreted; When you improperly interpret something else as UTF-8, e.g. the other way around, you'll see UNICODE replacement characters instead.

If you are using Notepad and can't figure out a way to open the file with UTF-8, try another text editor. There's tons - gedit, Notepad++, etc.

Also: if you're wondering why it's all not gibberish, that's because UTF-8 is an ASCII-compatible encoding. The first 128 characters are assigned the same in UTF-8 as ASCII. A lot of character sets share this property, or at least are close to it.

Edit: Upon further questions, it does appear something fishy is going on. At the very least, though, it is definitely some weird encoding issue.

C - Strange behavior with fprintf

Answers (2)

Related Questions