user17753
user17753

Reputation: 3161

Creating a UTF8 text file instead of ANSI

I have this snippet of code to write a file asynchronously:

    private static async Task WriteTextAsync(string filePath, string text)
    { //Writes to our output files
        byte[] encodedText = Encoding.UTF8.GetBytes(text);
        using (FileStream sourceStream = new FileStream(filePath,
            FileMode.Create, FileAccess.Write, FileShare.None,
            bufferSize: 4096, useAsync: true))
        {
            await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
        };
    }

The created text file is still ANSI format despite having set the Encoding.UTF8. There's 15 overloaded constructors for the FileStream class, and it's not obvious at all to me where I should set this if not on the encoded text.

I can tell the file is ANSI, because when I open it in TextPad and view the file statistics it lists ANSI as the Code Set:

enter image description here

Having problems, because MySQL LOAD INFILE is not reading the file properly, after reading the answers I believe it has something to do with the BOM, but not sure.

I tried this (for BOM):

        byte[] encodedText = new byte[] { 0xEF, 0xBB, 0xBF }.Concat(Encoding.UTF8.GetBytes(text)).ToArray();
        using (FileStream sourceStream = new FileStream(filePath,
            FileMode.Create, FileAccess.Write, FileShare.None,
            bufferSize: 4096, useAsync: true))
        {
            await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
        };

Textpad then saw it as UTF8, MySQL LOAD INFILE still failed. Resaved in Textpad, and MySQL saw it properly.

Changed code to this:

        using (TextWriter writer = File.CreateText(filePath))
        {
            await writer.WriteAsync(text);
        }

This seemed to work in both. I'm not sure what the issue is with MySQL LOAD INFILE regarding this.

Upvotes: 3

Views: 13316

Answers (2)

Ondra
Ondra

Reputation: 1647

I believe you forget to write BOM header to the beginning of the file. As you are using FileStream (and not some sort of TextWriter) you have to write it manually. In case of UTF-8 it should be "EF BB BF"

Upvotes: 2

Marc Gravell
Marc Gravell

Reputation: 1062935

No, it is definitely UTF-8:

byte[] encodedText = Encoding.UTF8.GetBytes(text);

That can only give you UTF-8; you then write encodedText to the stream.

However! UTF-8 will look identical to ASCII/ANSI for any characters in the 0-127 range. It only looks different above that. False positive?

Upvotes: 4

Related Questions