dude
dude

Reputation: 29

Variable chunk size

I am using the code below to break stream into smaller chunks. however, my chunk size is constant and i want it to be a variable. I want program to read till it hit symbol '$'and make '$ position to be chunk size.

For example: lets say txt file contains 01234583145329$34212349$2134567009$, so my 1st chunk size should be 14, second should be 8 and third should be 10. I did some research and find out that can be achieved by indexof method, but I am not able to implement that with the code below. Please advise.

If there is another efficient way other than Indexof, please let me know.

public static IEnumerable<IEnumerable<byte>> ReadByChunk(int chunkSize)
        {
            IEnumerable<byte> result;
            int startingByte = 0;

            do
            {
                result = ReadBytes(startingByte, chunkSize);
                startingByte += chunkSize;
                yield return result;
            } 
            while (result.Any());
        }

        public static IEnumerable<byte> ReadBytes(int startingByte, int byteToRead)
        {
            byte[] result;
            using (FileStream stream = File.Open(@"C:\Users\file.txt", FileMode.Open, FileAccess.Read, FileShare.Read))

            using (BinaryReader reader = new BinaryReader(stream))
            {
                int bytesToRead = Math.Max(Math.Min(byteToRead, (int)reader.BaseStream.Length - startingByte), 0);
                reader.BaseStream.Seek(startingByte, SeekOrigin.Begin);
                result = reader.ReadBytes(bytesToRead);
                int chunkSize = Index of

            }
            return result;
        }
static void Main()
        {
                int chunkSize = 8;
                foreach (IEnumerable<byte> bytes in ReadByChunk(chunkSize))
                {
                //more code
                }
       }

Upvotes: 0

Views: 808

Answers (1)

Sweeper
Sweeper

Reputation: 272760

You seem to care about characters, not bytes here, as you are trying to find $ characters. Just reading bytes will only work in the specific case of "each character is encoded with one byte". Therefore, you should use ReadChar and return IEnumerable<IEnumerable<char>> instead.

You seem to be creating a new reader and stream for each chunk, which I feel is quite unnecessary. You could just create one stream and one reader in ReadByChunk, and pass it to the ReadBytes method.

The IndexOf you found is probably for strings. I assume you want to lazily read from a stream, so reading everything into a string first and then using IndexOf seems to go against your intention.

For a text file, I would also recommend you to use StreamReader. BinaryReader is for reading binary files.

Here's my attempt:

public static IEnumerable<IEnumerable<char>> ReadByChunk()
{
    using (StreamReader reader = new StreamReader(File.Open(...))) {
        while (reader.Peek() != -1) { // while not at the end of the stream...
            yield return ReadUntilNextDollarSign(reader);
        }
    }
}

public static IEnumerable<char> ReadUntilNextDollarSign(StreamReader reader)
{
    char c;
    // while not at the end of the stream, and the next char is not a dollar sign...
    while (reader.Peek() != -1 && (c = (char)reader.Read()) != '$') {
        yield return c;
    }
}

Upvotes: 3

Related Questions