Canacourse
Canacourse

Reputation: 1865

Unicode / Non-Unicode / UTF-8 Problems

An application I am working on stores data in an INI file. The application creates the INI file which in turn will be read by another application we also created. The INI file may also be hand edited.

It is likely sooner or later that the INI file will contain different languages so we were careful to ensure that all data used in this file was in unicode format.

After creating the INI file initially We examined the file in notepad and noticed that the letter spacing was screwed up. After a bit of of research we discovered the Unicode Byte Order Mark (BOM) FF FE & started writing this at the start of the file and all seemed well - The File was created correctly and could be hand edited in notepad.

Now the problem - We went looking for an INI file parser instead of creating our own. Boost property Tree seemed ideal but it seems the BOM is not filtered out out by the underlying wifstream and eventually property tree throws an exception because of this.

Next we tried SimpleINI link text but simpleINI (CSimpleIniW) does not seem to work unless the UTF-8 marker is at the start of the file.

So far 2 seemingly well developed INI file processors will not work with our simple INI File so we started thinking we are taking the wrong approach. Apart from the obvious "Should have used XML" What real world advice can you offer on this problem?

UPDATE:

I have this working now. The BOM wasn't the problem. It was because the data was not stored in UTF8. Thanks....

Upvotes: 3

Views: 1742

Answers (3)

ZZ Coder
ZZ Coder

Reputation: 75496

If you intend to use Unicode in INI file, BOM is required. Without BOM, the reader doesn't know which encoding it's in. It could be in UTF-16 (big/small endian) or UTF-8. This is a big drawback of INI file. XML has a visible preamble that you can specify encoding and it's much easier to deal with.

We use GetPrivateProfileStringW to read INI files in UTF-8 and haven't found any issues as long as BOM is there.

If this is a Windows app, you really should switch to registry. Otherwise, XML is the way to go.

Upvotes: 3

Rick Strahl
Rick Strahl

Reputation: 17681

Is there any reason you're not using the native Windows API's for reading and writing the profiles? Using the native APIs should ensure that the data will get picked up consistently by both applications since they'd be using the same exact APIs.

Upvotes: 1

Dor
Dor

Reputation: 7504

Use a text-editor that removes the BOM, such as Notepad++.
There's no problem in removing the BOM, and this is a common solution in Web Development.

Upvotes: 1

Related Questions