paneerlovr
paneerlovr

Reputation: 111

How to apply a filter to a StringReader in a memory efficient way?

I wrote the following older piece of code a while back. The premise behind this, was this was the take a regular expression and verify if it matches a criteria. If it does, then allow it to flow through along in the stream. The code is poor when it comes to a large data set (5GB) as it reads through the whole file to create a new stream.

    public static StringReader GetReader(String fileName, Regex r)
    {
        var sr = new StreamReader(fileName);
        List<string> lines = new List<string>();
        while (!sr.EndOfStream)
        {
            var stringContents = sr.ReadLine();
            if (r.IsMatch(stringContents))
            {
                lines.Add(stringContents);
            }
        }
        return new StringReader(String.Join(Environment.NewLine, lines));
    }

The consumer of the new StringReader is actually a CsvReader (LumenWorks on nuget actually) class that takes a StringReader to stream data from to allow Csv access.

I want to make something new that would not load all the data into any object, rather, I'd like to stream it out and filter on the data as I stream it. This should reduce my memory footprint.

My idea right now is to extend off of StringReader. I would extend off it it and then override methods such as ReadLine. The issue with this is that my regular expression matching pattern goes on lines. Users of my new regex filtering StringReader are not forced to retrieve data with ReadLine(). I'm not quite sure how to tackle this. In fact there are many many different methods that may be used with StringReader (ReadBlock, ReadToEnd, etc).

So my question is:

How to apply a filter to a StringReader in a memory efficient way while still preserving the concept that it is a StringReader?

Any advice would be appreciated.

Upvotes: 1

Views: 159

Answers (0)

Related Questions