davymartu
davymartu

Reputation: 1443

How to read mixed file of byte and string

I've a mixed file with a lot of string line and part of byte encoded data. Example:

--Begin Attach
Content-Info: /Format=TIF
Content-Description: 30085949.tif (TIF File)
Content-Transfer-Encoding: binary; Length=220096
II*II* Îh  ÿÿÿÿÿÿü³küìpsMg›Êq™Æ™Ôd™‡–h7ÃAøAú áùõ=6?Eã½/ô|û ƒú7z:>„Çÿý<þ¯úýúßj?å¿þÇéöûþ“«ÿ¾ÁøKøÈ%ŠdOÿÞÈ<,Wþ‡ÿ·ƒïüúCÿß%Ï$sŸÿÃÿ÷‡þåiò>GÈù#ä|‘ò:#ä|Š":#¢:;ˆèŽˆèʤV‘ÑÑÑÑÑÑÑÑÑçIþ×o(¿zHDDDDDFp'.Ñ:ˆR:aAràÁ¬LˆÈù!ÿÿï[ÿ¯Äàiƒ"VƒDÇ)Ê6PáÈê$9C”9C†‡CD¡pE@¦œÖ{i~Úý¯kköDœ4ÉU”8`ƒt!l2G
--End Attach--

i try to read file with streamreader:

string[] lines = System.IO.File.ReadAllLines(@"C:\Users\Davide\Desktop\20041230000D.xmm")

I read line by line the file, and when line is equal "Content-Transfer-Encoding: binary; Length=220096", i read all following lines and write a "filename"(in this case 30085949.tif) file. But i'm reading strings, not byte data and result file is damaged (now i try with tiff file). Any suggestion for me?

SOLUTION Thanks for reply. I've adopted this solution: I builded a LineReader extend BinaryReader:

 public class LineReader : BinaryReader
    {
        public LineReader(Stream stream, Encoding encoding)
            : base(stream, encoding)
        {

        }

        public int currentPos;
        private StringBuilder stringBuffer;

        public string ReadLine()
        {
            currentPos = 0;

            char[] buf = new char[1];

            stringBuffer = new StringBuilder();
            bool lineEndFound = false;

            while (base.Read(buf, 0, 1) > 0)
            {
                currentPos++;
                if (buf[0] == Microsoft.VisualBasic.Strings.ChrW(10))
                {
                    lineEndFound = true;
                }
                else
                {                   
                    stringBuffer.Append(buf[0]);                    
                }
                if (lineEndFound)
                {
                    return stringBuffer.ToString();
                }

            }
            return stringBuffer.ToString();

        }

    }

Where Microsoft.VisualBasic.Strings.ChrW(10) is a Line Feed. When i parse my file:

    using (LineReader b = new LineReader(File.OpenRead(path), Encoding.Default))
    {
        int pos = 0;
        int length = (int)b.BaseStream.Length;
        while (pos < length)
        {
            string line = b.ReadLine();
            pos += (b.currentPos);

            if (!beginNextPart)
            {
                if (line.StartsWith(BEGINATTACH))
                {
                    beginNextPart = true;

                }
            }
            else
            {
                if (line.StartsWith(ENDATTACH))
                {
                    beginNextPart = false;
                }
                else
                {
                    if (line.StartsWith("Content-Transfer-Encoding: binary; Length="))
                    {
                        attachLength = Convert.ToInt32(line.Replace("Content-Transfer-Encoding: binary; Length=", ""));
                        byte[] attachData = b.ReadBytes(attachLength);
                        pos += (attachLength);
                        ByteArrayToFile(@"C:\users\davide\desktop\files.tif", attachData);
                    }
                }
            }
        }
    }

I read a byte length from file and i read following n bytes.

Upvotes: 4

Views: 2409

Answers (1)

MattW
MattW

Reputation: 4628

Your problem here is that a StreamReader assumes that it is the only thing reading the file, and as a result it reads ahead. Your best bet is to read the file as binary and use the appropriate text encoding to retrieve the string data out of your own buffer.

Since apparently you don't mind reading the entire file into memory, you can start with a:

byte[] buf = System.IO.File.ReadAllBytes(@"C:\Users\Davide\Desktop\20041230000D.xmm");

Then assuming you're using UTF-8 for your text data:

int offset = 0;
int binaryLength = 0;
while (binaryLength == 0 && offset < buf.Length) {
    var eolIdx = Array.IndexOf(offset, 13); // In a UTF-8 stream, byte 13 always represents newline
    string line = System.Text.Encoding.UTF8.GetString(buf, offset, eolIdx - offset - 1);

    // Process your line appropriately here, and set binaryLength if you expect binary data to follow

    offset = eolIdx + 1;
}

// You don't necessarily need to copy binary data out, but just to show where it is:
var binary = new byte[binaryLength];
Buffer.BlockCopy(buf, offset, binary, 0, binaryLength);

You might also want to do a line.TrimEnd('\r'), if you expect Window-style line endings.

Upvotes: 3

Related Questions