Rod
Rod

Reputation: 15477

appending and reading text file

Environment: Any .Net Framework welcomed. I have a log file that gets written to 24/7.

I am trying to create an application that will read the log file and process the data.

What's the best way to read the log file efficiently? I imagine monitoring the file with something like FileSystemWatcher. But how do I make sure I don't read the same data once it's been processed by my application? Or say the application aborts for some unknown reason, how would it pick up where it left off last?

There's usually a header and footer around the payload that's in the log file. Maybe an id field in the content as well. Not sure yet though about the id field being there.

I also imagined maybe saving the lines read count somewhere to maybe use that as bookmark.

Upvotes: 2

Views: 1249

Answers (4)

akton
akton

Reputation: 14386

Is there a reason why it logs to a file? Files are great because they are simple to use and, being the lowest common denominator, there is relatively little that can go wrong. However, files are limited. As you say, there's no guarantee a write to the file will be complete when you read the file. Multiple applications writing to the log can interfere with each other. There is no easy sorting or filtering mechanism. Log files can grow very big very quickly and there's no easy way to move old events (say those more than 24 hours old) into separate files for backup and retention.

Instead, I would considering writing the logs to a database. The table structure can be very simple but you get the advantage of transactions (so you can extract or backup with ease) and search, sort and filter using an almost universally understood syntax. If you are worried about load spikes, use a message queue, like http://msdn.microsoft.com/en-us/library/ms190495.aspx for SQL Server.

To make the transition easier, consider using a logging framework like log4net. It abstracts much of this away from your code.

Another alternative is to use a system like syslog or, if you have multiple servers and a large volume of logs, flume. By moving the log files away the source computer, you can store them or inspect them on a different machine far more effectively. However, these are probably overkill for your current problem.

Upvotes: 1

Despertar
Despertar

Reputation: 22392

I think you will find the File.ReadLines(filename) function in conjuction with LINQ will be very handy for something like this. ReadAllLines() will load the entire text file into memory as a string[] array, but ReadLines will allow you to begin enumerating the lines immediately as it traverses through the file. This not only saves you time but keeps the memory usage very low as it is processing each line one at a time. Using statements are important because if this program is interrupted it will close the filestreams flushing the writer and saving unwritten content to the file. Then when it starts up it will skip all the files that are already read.

int readCount = File.ReadLines("readLogs.txt").Count();
using (FileStream readLogs = new FileStream("readLogs.txt", FileMode.Append))
using (StreamWriter writer = new StreamWriter(readLogs))
{
     IEnumerable<string> lines = File.ReadLines(bigLogFile.txt).Skip(readCount);
     foreach (string line in lines)
     {
         // do something with line or batch them if you need more than one
         writer.WriteLine(line);
     }
}

As MaciekTalaska mentioned, I would strongly recommend using a database if this is something written to 24/7 and will get quite large. File systems are simply not equipped to handle such volume and you will spend a lot of time trying to invent solutions where a database could do it in a breeze.

Upvotes: 1

aiodintsov
aiodintsov

Reputation: 2605

Well, you will have to figure out your magic for your particular case yourself. If you are going to use well-known text encoding it may be pretty simple thoght. Look toward System.IO.StreamReader and it's ReadLine(), DiscardBufferedData() methods and BaseStream property. You should be able to remember your last position in the file and rewind to that position later and start reading again, given that you are sure that file is only appended. There are other things to consider though and there is no single universal answer to this.

Just as a naive example (you may still need to adjust a lot to make it work):

    static void Main(string[] args)
    {
        string filePath = @"c:\log.txt";
        using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            using (var streamReader = new StreamReader(stream,Encoding.Unicode))
            {
                long pos = 0;
                if (File.Exists(@"c:\log.txt.lastposition"))
                {
                    string strPos = File.ReadAllText(@"c:\log.txt.lastposition");
                    pos = Convert.ToInt64(strPos);
                }
                streamReader.BaseStream.Seek(pos, SeekOrigin.Begin); // rewind to last set position.
                streamReader.DiscardBufferedData(); // clearing buffer
                for(;;)
                {
                    string line = streamReader.ReadLine();
                    if( line==null) break;

                    ProcessLine(line);
                }
                // pretty sure when everything is read position is at the end of file.
                File.WriteAllText(@"c:\log.txt.lastposition",streamReader.BaseStream.Position.ToString());
            }
        }
    }

Upvotes: 1

Maciek Talaska
Maciek Talaska

Reputation: 1638

For obvious reasons reading the whole content of the file, as well as removing lines from the log files (after loading them into your application) is out of question.

What I can think of as a partial solution is having a small database (probable something much smaller than a full-blown MySQL/MS SQL/PostgreSQL instance) and populating table with what has been read from the log file. I am pretty sure that even if there is power cut off and then the machine is booted again, most of the relational databases should be able to restore it's state with ease. This solution requires some data that could be used to identify the row from the log file (for example: exact time of the action logged, machine on which the action has taken place etc.)

Upvotes: 1

Related Questions