Ankit Kumar
Ankit Kumar

Reputation: 514

Read lines batch wise in c#

I have below code which is reading a .json stream line by line. since it will be a lengthy process, I have decided that I will take 100 lines at a time before I call my main function. and so the below code works fine. but this also gives me an issue if number of lines is less than 100, in that case my main function will not be called. how can I optimize my below code to handle both the scenario i.e. read maximum 100 lines at a time and pass it to main function or read all the lines if it is below 100

public async void ReadJsonStream()
{
    JsonSerializer serializer = new JsonSerializer();

    using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
    {
        using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
        {
            int counter = 1;
            List<string> lines = new List<string>();
            
            while (streamReader.Peek() >= 0)
            {
                lines.Add(streamReader.ReadLine());

                if (counter == 100)
                {
                    counter = 1;
                    // call main function with line
                    lines.Clear();
                }
                counter++;
            }
        }
    }
}

Upvotes: 0

Views: 917

Answers (2)

xanatos
xanatos

Reputation: 111850

I feel what you are trying to do is wrong. How will you parse 100 lines? Do you want to rebuild from scratch a Json deserializer? And what will happen if some piece of json is split between the line 100 and the line 101?

But in the end, you asked for something and I'll give you what you asked.

public async void ReadJsonStream()
{
    JsonSerializer serializer = new JsonSerializer();

    using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
    {
        using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
        {
            List<string> lines = new List<string>();

            string line;

            while ((line = await streamReader.ReadLineAsync()) != null)
            {
                lines.Add(line);

                if (lines.Count == 100)
                {
                    // call main function with lines
                    lines.Clear();
                }
            }

            if (lines.Count != 0)
            {
                // call main function with lines
                lines.Clear(); // useless
            }
        }
    }
}

As others noted, you forgot the "additional" call to // call main function with lines at the end of the cycle. I've even modified the code. You don't need to .Peek(), .ReadLine() returns null at the end of the input stream. You made your method async... You could make it fully async by using .ReadLineAsync().

Note that the JsonSerializer of Json.NET already has a Deserialize method that accept a TextReader (and a StreamReader is a TextReader), and that method will read the file "a piece at a time", and won't preload it before parsing it.

Upvotes: 2

Ankit Giri
Ankit Giri

Reputation: 2875

Add a check after the while loop. If the lines list is not empty, call main.

public async void ReadJsonStream()
{
    JsonSerializer serializer = new JsonSerializer();

    using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
    {
        using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
        {
            int counter = 1;
            List<string> lines = new List<string>();

            while (streamReader.Peek() >= 0)
            {
                lines.Add(streamReader.ReadLine());

                if (counter == 100)
                {
                    counter = 1;
                    // call main function with line
                    lines.Clear();
                }
                counter++;
            }
            if (lines.Count > 0)
                // call main function with line
        }
    }


}
``

Upvotes: 1

Related Questions