Shailendra
Shailendra

Reputation: 39

Read the large text files into chunks line by line

Suppose the following lines in text file to which i have to read

INFO  2014-03-31 00:26:57,829 332024549ms Service1 startmethod - FillPropertyColor end
INFO  2014-03-31 00:26:57,829 332024549ms Service1 getReports_Dataset - getReports_Dataset started
INFO  2014-03-31 00:26:57,829 332024549ms Service1 cheduledGeneration - SwitchScheduledGeneration start
INFO  2014-03-31 00:26:57,829 332024549ms Service1 cheduledGeneration - SwitchScheduledGeneration limitId, subscriptionId, limitPeriod, dtNextScheduledDate,shoplimittype0, 0, , 3/31/2014 12:26:57 AM,0

I use the FileStream method to read the text file because the text file size having size over 1 GB. I have to read the files into chunks like initially in first run of program this would read two lines i.e. up to "getReports_Dataset started of second line". In next run it should read from 3rd line. I did the code but unable to get desired output.Problem is that my code doesn't give the exact chunk from where i have to start read text in next run. And second problem is while reading text lines .. don't give a complete line..i.e. some part is missing in lines. Following code:

readPosition = getLastReadPosition();
using (FileStream fStream = new FileStream(logFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (System.IO.StreamReader rdr = new System.IO.StreamReader(fStream))
{
    rdr.BaseStream.Seek(readPosition, SeekOrigin.Begin);
    while (numCharCount > 0)
    {
        int numChars = rdr.ReadBlock(block, 0, block.Length);

        string blockString = new string(block);
        lines = blockString.Split(Convert.ToChar('\r'));
        lines[0] = fragment + lines[0];
        fragment = lines[lines.Length - 1];

        foreach (string line in lines)
        {
            lstTextLog.Add(line);
            if (lstTextLog.Contains(fragment))
            {
                lstTextLog.Remove(fragment);
            }
            numProcessedChar++;
        }
        numCharCount--;
    }
    SetLastPosition(numProcessedChar, logFilePath);
}

Upvotes: 0

Views: 3912

Answers (1)

Jim Mischel
Jim Mischel

Reputation: 133995

If you want to read a file line-by-line, do this:

foreach (string line in File.ReadLines("filename"))
{
    // process line here
}

If you really must read a line and save the position, you need to save the last line number read, rather than the stream position. For example:

int lastLineRead = getLastLineRead();
string nextLine = File.ReadLines("filename").Skip(lastLineRead).FirstOrDefault();
if (nextLine != null)
{
    lastLineRead++;
    SetLastPosition(lastLineRead, logFilePath);
}

The reason you can't do it by saving the base stream position is because StreamReader reads a large buffer full of data from the base stream, which moves the file pointer forward by the buffer size. StreamReader then satisfies read requests from that buffer until it has to read the next buffer full. For example, say you open a StreamReader and ask for a single character. Assuming that it has a buffer size of 4 kilobytes, StreamReader does essentially this:

if (buffer is empty)
{
    read buffer (4,096 bytes) from base stream
    buffer_position = 0;
}
char c = buffer[buffer_position];
buffer_position++;    // increment position for next read
return c;

Now, if you ask for the base stream's position, it's going to report that the position is at 4096, even though you've only read one character from the StreamReader.

Upvotes: 3

Related Questions