Frank-Rene Schäfer
Frank-Rene Schäfer

Reputation: 3352

Issue with ICONV writing BOM if output is platform endian

When choosing UTF-32, for platform dependent endian, libiconv converts correctly but prefixes a 0xfeff BOM to the output stream. This causes some trouble.

When choosing UCS-4, no BOM is written but on my system it converts to 'big endian' which happens to be not the endianness of my system.

Are there any suggestions how to convert to UTF-32/UCS-4 with the platform-dependent endianess without having the remove the BOM manually?

Upvotes: 1

Views: 316

Answers (2)

Bruno Haible
Bruno Haible

Reputation: 1282

iconv (both the glibc implementation and the GNU libiconv implementation) support encoding names that specify a fixed endianness:

  • UTF-32LE = UCS-4LE : UCS-4 in little endian flavour, without BOM
  • UTF-32BE = UCS-4BE : UCS-4 in big endian flavour, without BOM
  • UTF-16LE : UTF-16 in little endian flavour, without BOM
  • UTF-16BE : UTF-16 in big endian flavour, without BOM
  • wchar_t (an alias for UCS-4-INTERNAL) : UCS-4 with the platform's endianness and alignment restrictions

Note that strings in these encodings should better not be transported to other machines, otherwise the lack of a BOM would cause problems.

Upvotes: 0

一二三
一二三

Reputation: 21249

If you don't specify a byte order, the default is always big endian. To use the byte order of the current platform, use the special UCS-4-INTERNAL (or UCS-2-INTERNAL) encoding.

Upvotes: 2

Related Questions