Gasfar Muhametdinov
Gasfar Muhametdinov

Reputation: 37

Encode cp1252 string to utf-8 string in c#

How I can convert cp1252 string to utf-8 string in c#? I tried this code, but it doesn't work:

Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.GetEncoding(1251);
byte[] wind1252Bytes = ReadFile(myString1252);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string myStringUtf8 = Encoding.UTF8.GetString(utf8Bytes);

Upvotes: 2

Views: 5893

Answers (1)

Jeppe Stig Nielsen
Jeppe Stig Nielsen

Reputation: 61912

var myGoodString = System.IO.File.ReadAllText(
    @"C:\path\to\file.txt",
    Encoding.GetEncoding("Windows-1252")
    );

A .NET/CLR string in memory cannot be UTF-8. It is just Unicode, or UTF-16 if you like.

The above code will properly read a text file in CP1252 into a .NET string.

If you insist on going through a byte[] wind1252Bytes, it is simply:

var myGoodString = Encoding.GetEncoding("Windows-1252").GetString(wind1252Bytes);

Since this answer was written, new versions of the framework .NET have appeared which do not by default recognize all the old (legacy) Windows-specific code pages. If Encoding.GetEncoding("Windows-1252") throws an exception with your runtime version, try registrering an additional provider with

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

(may need additional assembly reference to System.Text.Encoding.CodePages.dll) before you use Encoding.GetEncoding("Windows-1252").

See CodePagesEncodingProvider class documentation.

Upvotes: 3

Related Questions