Reputation: 483
I need to convert a string to UTF-8 in C#. I've already try many ways but none works as I wanted. I converted my string into a byte array and then to try to write it to an XML file (which encoding is UTF-8....) but either I got the same string (not encoded at all) either I got a list of byte which is useless.... Does someone face the same issue ?
Edit : This is some of the code I used :
str= "testé";
byte[] utf8Bytes = Encoding.UTF8.GetBytes(str);
return Encoding.UTF8.GetString(utf8Bytes);
The result is "testé" or I expected something like "testé"...
Upvotes: 20
Views: 123566
Reputation: 1
I have tested with TerraFX.Interop.Windows with string
to char*
I have found to resolve and I just invented own Converter of string and char pointer...
namespace DeafMan1983.Interop.Runtime.Utilities;
using System;
public static unsafe class UtilitiesForUTF16
* string from char* (UTF16)
public static string CharPointerToString(char* ptr)
if (ptr == null)
return string.Empty;
int length = 0;
while (ptr[length] != '\0')
return new string(ptr, 0, length);
* char* (UTF16) from string
public static char* StringToCharPointer(string input)
if (input == null)
return null;
char* utf16Ptr = stackalloc char[input.Length + 1];
fixed (char* inputPtr = input)
for (int i = 0; i < input.Length; i++)
utf16Ptr[i] = inputPtr[i];
inputPtr[input.Length] = '\0';
return inputPtr;
* It works like strlen, but it uses only char* (UTF16)
public static int CharPointerLength(char* charPtrs)
if (charPtrs == null)
return 0;
int length = 0;
while (charPtrs[length] != '\0')
return length;
Test with Program.cs
// Test for char* (UTF16)
string string_str1 = "Hello World!";
char* char_str1 = StringToCharPointer(string_str1);
Console.WriteLine($"Result: {CharPointerToString(char_str1)}");
Console.WriteLine($"Length of CharPointer: {CharPointerLength(char_str1)}");
char* char_str2 = stackalloc char[] { 'H', 'e', 'l', 'l', 'o' };
string str2 = CharPointerToString(char_str2);
Console.WriteLine($"Result: {str2}");
Console.WriteLine($"Length of CharPointer: {CharPointerLength(char_str2)}");
And I have tested with TerraFX.Interop.Windows It doesn't show Chinese Langauge. Yay! But I don't know if it happens then we would like to put some Encoding classes. I will look for that.
Proof: StringToCharPointer() and Window Title shows normally.
Have Fun and enjoy your happy coding!
Upvotes: 0
Reputation: 11
private static string Utf16ToUtf8(string utf16String)
* Every .NET string will store text with the UTF16 encoding, *
* known as Encoding.Unicode. Other encodings may exist as *
* Byte-Array or incorrectly stored with the UTF16 encoding. *
* *
* UTF8 = 1 bytes per char *
* ["100" for the ansi 'd'] *
* ["206" and "186" for the russian '?'] *
* *
* UTF16 = 2 bytes per char *
* ["100, 0" for the ansi 'd'] *
* ["186, 3" for the russian '?'] *
* *
* UTF8 inside UTF16 *
* ["100, 0" for the ansi 'd'] *
* ["206, 0" and "186, 0" for the russian '?'] *
* *
* We can use the convert encoding function to convert an *
* UTF16 Byte-Array to an UTF8 Byte-Array. When we use UTF8 *
* encoding to string method now, we will get a UTF16 string. *
* *
* So we imitate UTF16 by filling the second byte of a char *
* with a 0 byte (binary 0) while creating the string. *
// Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);
char[] chars = (char[])Array.CreateInstance(typeof(char), utf8Bytes.Length);
for (int i = 0; i < utf8Bytes.Length; i++)
chars[i] = BitConverter.ToChar(new byte[2] { utf8Bytes[i], 0 }, 0);
// Return UTF8
return new String(chars);
In the original post author concatenated strings. Every sting operation will result in string recreation in .Net. String is effectively a reference type. As a result, the function provided will be visibly slow. Don't do that. Use array of chars instead, write there directly and then convert result to string. In my case of processing 500 kb of text difference is almost 5 minutes.
Upvotes: 1
Reputation: 1
class Program
static void Main(string[] args)
String unicodeString =
"This Unicode string contains two characters " +
"with codes outside the traditional ASCII code range, " +
"Pi (\u03a0) and Sigma (\u03a3).";
Console.WriteLine("Original string:");
UnicodeEncoding unicodeEncoding = new UnicodeEncoding();
byte[] utf16Bytes = unicodeEncoding.GetBytes(unicodeString);
char[] chars = unicodeEncoding.GetChars(utf16Bytes, 2, utf16Bytes.Length - 2);
string s = new string(chars);
Console.WriteLine("Char Array:");
foreach (char c in chars) Console.Write(c);
Console.WriteLine("String from Char Array:");
Upvotes: 0
Reputation: 585
If you want a UTF8 string, where every byte is correct ('Ö' -> [195, 0] , [150, 0]), you can use the followed:
public static string Utf16ToUtf8(string utf16String)
* Every .NET string will store text with the UTF16 encoding, *
* known as Encoding.Unicode. Other encodings may exist as *
* Byte-Array or incorrectly stored with the UTF16 encoding. *
* *
* UTF8 = 1 bytes per char *
* ["100" for the ansi 'd'] *
* ["206" and "186" for the russian 'κ'] *
* *
* UTF16 = 2 bytes per char *
* ["100, 0" for the ansi 'd'] *
* ["186, 3" for the russian 'κ'] *
* *
* UTF8 inside UTF16 *
* ["100, 0" for the ansi 'd'] *
* ["206, 0" and "186, 0" for the russian 'κ'] *
* *
* We can use the convert encoding function to convert an *
* UTF16 Byte-Array to an UTF8 Byte-Array. When we use UTF8 *
* encoding to string method now, we will get a UTF16 string. *
* *
* So we imitate UTF16 by filling the second byte of a char *
* with a 0 byte (binary 0) while creating the string. *
// Storage for the UTF8 string
string utf8String = String.Empty;
// Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);
// Fill UTF8 bytes inside UTF8 string
for (int i = 0; i < utf8Bytes.Length; i++)
// Because char always saves 2 bytes, fill char with 0
byte[] utf8Container = new byte[2] { utf8Bytes[i], 0 };
utf8String += BitConverter.ToChar(utf8Container, 0);
// Return UTF8
return utf8String;
In my case the DLL request is a UTF8 string too, but unfortunately the UTF8 string must be interpreted with UTF16 encoding ('Ö' -> [195, 0], [19, 32]). So the ANSI '–' which is 150 has to be converted to the UTF16 '–' which is 8211. If you have this case too, you can use the following instead:
public static string Utf16ToUtf8(string utf16String)
// Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);
// Return UTF8 bytes as ANSI string
return Encoding.Default.GetString(utf8Bytes);
Or the Native-Method:
private static extern Int32 WideCharToMultiByte(UInt32 CodePage, UInt32 dwFlags, [MarshalAs(UnmanagedType.LPWStr)] String lpWideCharStr, Int32 cchWideChar, [Out, MarshalAs(UnmanagedType.LPStr)] StringBuilder lpMultiByteStr, Int32 cbMultiByte, IntPtr lpDefaultChar, IntPtr lpUsedDefaultChar);
public static string Utf16ToUtf8(string utf16String)
Int32 iNewDataLen = WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, utf16String.Length, null, 0, IntPtr.Zero, IntPtr.Zero);
if (iNewDataLen > 1)
StringBuilder utf8String = new StringBuilder(iNewDataLen);
WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, -1, utf8String, utf8String.Capacity, IntPtr.Zero, IntPtr.Zero);
return utf8String.ToString();
return String.Empty;
If you need it the other way around, see Utf8ToUtf16. Hope I could be of help.
Upvotes: 20
Reputation: 292415
A string in C# is always UTF-16, there is no way to "convert" it. The encoding is irrelevant as long as you manipulate the string in memory, it only matters if you write the string to a stream (file, memory stream, network stream...).
If you want to write the string to a XML file, just specify the encoding when you create the XmlWriter
Upvotes: 18
Reputation: 13315
does this example help ?
using System;
using System.IO;
using System.Text;
class Test
public static void Main()
using (StreamWriter output = new StreamWriter("practice.txt"))
// Create and write a string containing the symbol for Pi.
string srcString = "Area = \u03A0r^2";
// Convert the UTF-16 encoded source string to UTF-8 and ASCII.
byte[] utf8String = Encoding.UTF8.GetBytes(srcString);
byte[] asciiString = Encoding.ASCII.GetBytes(srcString);
// Write the UTF-8 and ASCII encoded byte arrays.
output.WriteLine("UTF-8 Bytes: {0}", BitConverter.ToString(utf8String));
output.WriteLine("ASCII Bytes: {0}", BitConverter.ToString(asciiString));
// Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded
// string and write.
output.WriteLine("UTF-8 Text : {0}", Encoding.UTF8.GetString(utf8String));
output.WriteLine("ASCII Text : {0}", Encoding.ASCII.GetString(asciiString));
Upvotes: 0
Reputation: 12025
Check the Jon Skeet answer to this other question: UTF-16 to UTF-8 conversion (for scripting in Windows)
It contains the source code that you need.
Hope it helps.
Upvotes: 0