Aaron
Aaron

Reputation: 7541

Using wrong encoding when writing to a file C#

I'm creating a binary file to transmit to a third party that contains images and information about each image. The file uses a record length format, so each record is a particular length. The beginning of each record is the Record Length Indicator, which is 4 characters long and represents the length of the record in Big Endian format.

I'm using a BinaryWriter to write to the file, and for the Record Length Indicator I'm using Encoding.Default.

The problem I'm having is that there is one character in one record that is displaying as a "?" because it is unrecognized. My algorithm to build the string for the record length indicator is this:

  private string toBigEndian(int value)
    {
        string returnValue = "";            
        string binary = Convert.ToString(value, 2).PadLeft(32, '0');
        List<int> binaryBlocks = new List<int>();
        binaryBlocks.Add(Convert.ToInt32(binary.Substring(0, 8), 2));
        binaryBlocks.Add(Convert.ToInt32(binary.Substring(8, 8), 2));
        binaryBlocks.Add(Convert.ToInt32(binary.Substring(16, 8), 2));
        binaryBlocks.Add(Convert.ToInt32(binary.Substring(24, 8), 2));

        foreach (int block in binaryBlocks)
        {                
            returnValue += (char)block;
        }

        Console.WriteLine(value);

        return returnValue;
    }

It takes the length of the record, converts it to 32-bit binary, converts that to chunks of 8-bit binary, and then converts each chunk to its appropriate character. The string that is returned here does contain the correct characters, but when it's written to the file, one character is unrecognized. This is how I'm writing it:

//fileWriter is BinaryWriter and record is Encoding.Default
fileWriter.Write(record.GetBytes(toBigEndian(length)));

Perhaps I'm using the wrong type of encoding? I've tried UTF-8, which should work, but it gives me extra characters sometimes.

Thanks in advance for your help.

Upvotes: 2

Views: 1889

Answers (4)

Guffa
Guffa

Reputation: 700720

The problem is that you should not return the value as a string at all.

When you cast the value to a char, and then encode it as 8 bit characters, there are several values that will be encoded into the wrong byte code, and several values that will fail to be encoded at all (resulting in the ? characters). The only way not to lose data in that step would be to encode it as UTF-16, but that would give you eight bytes instead of four.

You should return is as a byte array, so that you can write it to the file without converting it back and forth between character data and binary data.

private byte[] toBigEndian(int value) {
   byte[] result = BitConverter.GetBytes(value);
   if (BitConverter.IsLittleEndian) Array.Reverse(result);
   return result;
}

fileWriter.Write(toBigEndian(length));

Upvotes: 6

Dr Spack
Dr Spack

Reputation: 181

Do not create a string from a int to write bytes. Better try this:

byte[] result = 
    {
      (byte)( value >> 24 ),
      (byte)( value >> 16 ),
      (byte)( value >> 8 ) ,
      (byte)( value >> 0 )
    };

Upvotes: 1

Remus Rusanu
Remus Rusanu

Reputation: 294407

To read/write bits from binary streams with appropriate endianess use the BitConverter class, since it has explicit support for endianess: http://msdn.microsoft.com/en-us/library/system.bitconverter.islittleendian.aspx

Converting to binary then tokenizing into bytes is, I must say, the most unorthodox way I see yet :)

Upvotes: 0

Simon Steele
Simon Steele

Reputation: 11608

If you really want a binary four bytes (i.e. not just four characters, but a big-endian 32-bit length value) then you want something like this:

byte[] bytes = new byte[4];
bytes[3] = (byte)((value >> 24) & 0xff);
bytes[2] = (byte)((value >> 16) & 0xff);
bytes[1] = (byte)((value >> 8) & 0xff);
bytes[0] = (byte)(value & 0xff);
fileWriter.Write(bytes);

Upvotes: 1

Related Questions