Bluemarble
Bluemarble

Reputation: 2069

Efficient way of Reading a very large text file from a specific line number to the end in C#

I have a requirement where I need to read a very large text file (~2 Gb), but only starting from a specific line number till the end of the file.

I can not load the whole text in memory due to performance issues. So I have used StreamReader. But I noticed that there is no easy way to start the "reading" from a specific line number, Rather what I have done is I have started to read the file from line 1, and ignoring all the lines before I reach my desired line number.

Is this the correct approach ? This is what I have tried. Is there a better way to achieving this?

static string ReadLogFileFromSpecificLine(int LineNumber)
    {
        string content = null;

        using (StreamReader sr = new StreamReader(LogFilePath))
        {
            sr.ReadLine();
            int currentLineNumber = 0;

            string line;
            while ((line = sr.ReadLine()) != null)
            {
                currentLineNumber++;
                if(currentLineNumber >= LineNumber - 1)
                {
                    content += line + "\n";
                }
            }                
        }
        return content;
    }

Upvotes: 0

Views: 866

Answers (2)

Tim Schmelter
Tim Schmelter

Reputation: 460340

Yes, using a StreamReader is the way to go (a MemoryMappedFiled is overkill in this case). But you can simplify it and hide the StreamReader and use File.ReadLines which does not read all lines into memory(as opposed to File.ReadAllLines). You should also use a StringBuilder:

IEnumerable<string> lines = File.ReadLines(LogFilePath).Skip(LineNumber);
string content = new StringBuilder().AppendJoin(Environment.NewLine, lines).ToString();

Upvotes: 3

Jonathan Wood
Jonathan Wood

Reputation: 67345

This is the correct approach. There is no algorithm to figure out the offset of a particular line and seek to it.

You may be able to squeeze a little more performance out of it by have two loops. Once the starting line is reached, you could move to the second loop, which wouldn't need to check the line number. But that would impact performance only minimally at best.

In addition, I'm not sure what you need to do with those lines. You can either process one line at a time and avoid loading all of them in memory at once. Otherwise, you could use a List<string> to build the list. Or if you want all the lines in a single string, use ReadToEnd(), as @Fildor recommended. Do not concatenate the lines as you are doing. That is extremely inefficient.

Upvotes: 4

Related Questions