isxaker
isxaker

Reputation: 9456

How to read file by chunks

I'm a little bit confused aboot how i should read large file(> 8GB) by chunks in case each chunk has own size.

If I know chunk size it looks like code bellow:

using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, ProgramOptions.BufferSizeForChunkProcessing))
{
    using (BufferedStream bs = new BufferedStream(fs, ProgramOptions.BufferSizeForChunkProcessing))
    {
        byte[] buffer = new byte[ProgramOptions.BufferSizeForChunkProcessing];
        int byteRead;
        while ((byteRead = bs.Read(buffer, 0, ProgramOptions.BufferSizeForChunkProcessing)) > 0)
        {
            byte[] originalBytes;
            using (MemoryStream mStream = new MemoryStream())
            {
                mStream.Write(buffer, 0, byteRead);
                originalBytes = mStream.ToArray();
            }
        }
    }
}

But imagine, I've read large file by chunks made some coding with each chunk(chunk's size after that operation has been changed) and written to another new file all processed chunks. And now I need to do the opposite operation. But I don't know exactly chunk size. I have an idea. After each chunk has been processed i have to write new chunk size before chunk bytes. Like this:

Number of block bytes
Block bytes
Number of block bytes
Block bytes

So in that case first what i need to do is read chunk's header and learn what is chunk size exactly. I read and write to file only byte arrays. But I have a question - how should look chunk's header ? May be header have to contain some boundary ?

Upvotes: 2

Views: 17334

Answers (2)

Sajeepan
Sajeepan

Reputation: 1

try this

public static IEnumerable<byte[]> ReadChunks(string fileName)
    {
        const int MAX_BUFFER = 1048576;// 1MB 

        byte[] filechunk = new byte[MAX_BUFFER];
        int numBytes;
        using (var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            long remainBytes = fs.Length;
            int bufferBytes = MAX_BUFFER;

            while (true)
            {
                if (remainBytes <= MAX_BUFFER)
                {
                    filechunk = new byte[remainBytes];
                    bufferBytes = (int)remainBytes;
                }

                if ((numBytes = fs.Read(filechunk, 0, bufferBytes)) > 0)
                {
                    remainBytes -= bufferBytes;

                    yield return filechunk;
                }
                else
                {
                    break;
                }
            }
        }
    }

Upvotes: 0

Matthew Watson
Matthew Watson

Reputation: 109567

If the file is rigidly structured so that each block of data is preceded by a 32-bit length value, then it is easy to read. The "header" for each block is just the 32-bit length value.

If you want to read such a file, the easiest way is probably to encapsulate the reading into a method that returns IEnumerable<byte[]> like so:

public static IEnumerable<byte[]> ReadChunks(string path)
{
    var lengthBytes = new byte[sizeof(int)];

    using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
    {
        int n = fs.Read(lengthBytes, 0, sizeof (int));  // Read block size.

        if (n == 0)      // End of file.
            yield break;

        if (n != sizeof(int))
            throw new InvalidOperationException("Invalid header");

        int blockLength = BitConverter.ToInt32(lengthBytes, 0);
        var buffer = new byte[blockLength];
        n = fs.Read(buffer, 0, blockLength);

        if (n != blockLength)
            throw new InvalidOperationException("Missing data");

        yield return buffer;
    }
}

Then you can use it simply:

foreach (var block in ReadChunks("MyFileName"))
{
    // Process block.
}

Note that you don't need to provide your own buffering.

Upvotes: 8

Related Questions