Reputation: 3161
I have this snippet of code to write a file asynchronously:
private static async Task WriteTextAsync(string filePath, string text)
{ //Writes to our output files
byte[] encodedText = Encoding.UTF8.GetBytes(text);
using (FileStream sourceStream = new FileStream(filePath,
FileMode.Create, FileAccess.Write, FileShare.None,
bufferSize: 4096, useAsync: true))
{
await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
};
}
The created text file is still ANSI format despite having set the Encoding.UTF8
. There's 15 overloaded constructors for the FileStream
class, and it's not obvious at all to me where I should set this if not on the encoded text.
I can tell the file is ANSI, because when I open it in TextPad and view the file statistics it lists ANSI as the Code Set:
Having problems, because MySQL LOAD INFILE
is not reading the file properly, after reading the answers I believe it has something to do with the BOM, but not sure.
I tried this (for BOM):
byte[] encodedText = new byte[] { 0xEF, 0xBB, 0xBF }.Concat(Encoding.UTF8.GetBytes(text)).ToArray();
using (FileStream sourceStream = new FileStream(filePath,
FileMode.Create, FileAccess.Write, FileShare.None,
bufferSize: 4096, useAsync: true))
{
await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
};
Textpad then saw it as UTF8, MySQL LOAD INFILE
still failed. Resaved in Textpad, and MySQL saw it properly.
Changed code to this:
using (TextWriter writer = File.CreateText(filePath))
{
await writer.WriteAsync(text);
}
This seemed to work in both. I'm not sure what the issue is with MySQL LOAD INFILE
regarding this.
Upvotes: 3
Views: 13316
Reputation: 1647
I believe you forget to write BOM header to the beginning of the file. As you are using FileStream (and not some sort of TextWriter) you have to write it manually. In case of UTF-8 it should be "EF BB BF"
Upvotes: 2
Reputation: 1062935
No, it is definitely UTF-8:
byte[] encodedText = Encoding.UTF8.GetBytes(text);
That can only give you UTF-8; you then write encodedText
to the stream.
However! UTF-8 will look identical to ASCII/ANSI for any characters in the 0-127 range. It only looks different above that. False positive?
Upvotes: 4