Ray
Ray

Reputation: 3449

File Encoding: Does this apply to C#'s WriteByte method?

I'm new to the subject of encoding and would like to understand it in greater detail. I found this example on MSDN on creating a folder and file. The creation of the file is done by using the WriteByte method. http://msdn.microsoft.com/en-us/library/as2f1fez.aspx

For convenience, I've placed the code directly below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace CreateFolderFile
{
    class Program
    {
        static void Main(string[] args)
        {
            // Specify a "currently active folder"
            string activeDir = @"c:\testdir2";

            //Create a new subfolder under the current active folder
            string newPath = System.IO.Path.Combine(activeDir, "mySubDir");

            // Create the subfolder
            System.IO.Directory.CreateDirectory(newPath);

            // Create a new file name. This example generates
            // a random string.
            string newFileName = System.IO.Path.GetRandomFileName();

            // Combine the new file name with the path
            newPath = System.IO.Path.Combine(newPath, newFileName);

            // Create the file and write to it.
            // DANGER: System.IO.File.Create will overwrite the file
            // if it already exists. This can occur even with
            // random file names.
            if (!System.IO.File.Exists(newPath))
            {
                using (System.IO.FileStream fs = System.IO.File.Create(newPath))
                {
                    for (byte i = 0; i < 100; i++)
                    {
                        fs.WriteByte(i);
                    }
                }
            }

            // Read data back from the file to prove
            // that the previous code worked.
            try
            {

                byte[] readBuffer = System.IO.File.ReadAllBytes(newPath);
                foreach (byte b in readBuffer)
                {
                    Console.WriteLine(b);
                }
            }
            catch (System.IO.IOException e)
            {
                Console.WriteLine(e.Message);
            }



            // Keep the console window open in debug mode.
            System.Console.WriteLine("Press any key to exit.");
            System.Console.ReadKey();
        }
    }
}

I also found an interesting article by Joel Spolsky on this subject as well:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

My questions: What encoding is used by the WriteByte method? From the reading I've done, no matter what you use, is it really possible to accurately determine the encoding of file? (for example: a csv file you were sent and using Notepad++ to determine the encoding).

Thoughts?

Upvotes: 0

Views: 469

Answers (2)

Guffa
Guffa

Reputation: 700472

The WriteByte method doesn't use any encoding at all. The byte values are written exactly as specified, with no conversion.

Encoding is only used for text. Typically an entire text file uses the same encoding, but it's possible to have a file containing both binary data and encoded text.

The file itself doesn't have any information about any encoding. The file just contains bytes, and the encoding may be used to interpret the bytes as text.

Some file formats have an indicator in the beginning of the file to determine the encoding. Typically you would read the first part of the file using a neutral encoding (ASCII for example) to get the information about what encoding to use. (It's a bit of a bootstrap problem.)

The first line of an XML file for example may contain a version tag, which may contain an attribute specifying the encoding. Another example is the first character in a Unicode text file, which may be a BOM (byte order mark) that can be used to determine which type of unicode encoding was used.

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1501646

Stream.WriteByte deals with bytes as both input (the parameter to the method) and output (the target stream), which are inherently binary data - so the concept of an encoding (a mapping between text and binary information) doesn't apply.

Now if you were to read the file created using WriteByte calls as if it were a text file that would require you to interpret it in a particular encoding. That's a different matter - the contents of the file is still just bytes.

As noted in Guffa's answer, a file doesn't (typically, anyway1) have any notion of an encoding. It's just a bucket of bytes. If your file is just plain text, you have to either know what the encoding is when you read it, or infer it with heuristics.


1 A file system could keep metadata about encodings, of course - but it would be up to the creating program to set it.

Upvotes: 1

Related Questions