user2266126
user2266126

Reputation: 13

ANSI to UTF-8 conversion

I would like to know if :

  1. all characters encoded in ANSI (1252) could be converted to UTF-8 without any problem.
  2. all characters encoded in UTF-8 couldn't be converted to ANSI (1252) without any problem (example : Ǣ couldn't be converted to ANSI encoding).

Could you confirm for me that it corrects ?

Thanks !

Upvotes: 1

Views: 3780

Answers (1)

Keith Thompson
Keith Thompson

Reputation: 263647

Yes, all characters representable in Windows-1252 have Unicode equivalents, and can therefore be converted to UTF-8. See this Wikipedia article for a table showing the mapping to Unicode code points.

And since Windows-1252 is an 8-bit character set, and UTF-8 can represent many thousands of distinct characters, there are obviously plenty of characters representable as UTF-8 and not representable as Windows-1252.

Note that the name "ANSI" for the Windows-1252 encoding is strictly incorrect. When it was first proposed, it was intended to be an ANSI standard, but that never happened. Unfortunately, the name stuck. (Microsoft-related documentation also commonly refers to UTF-16 as "Unicode", another misnomer; UTF-16 is one representation of Unicode, but there are others.)

Upvotes: 3

Related Questions