Reputation: 4202
I have a requirement to read and process log file incrementally. Any suggestions on how to do this in Java?
I need to consider all possible scenarios like file rollover, different logging formats, etc.
Upvotes: 0
Views: 7268
Reputation: 4202
Though it's pretty late but just thought of writing the approach that I used to achieve this functionality.
Let's say we start a job to read a file periodically, after every 5 min.
During first run, read the entire file
Store line count and the last modified time of the file
It becomes interesting for subsequent job runs.
During next job run, check if the file is modified (using file last modified time and the one stored during earlier job run). If the file is not modified, do nothing.
If the file is modified, we just need to read the new lines. We have the line count from the earlier job so use it to determine the number of lines to skip.
So far so good, what if the file is rolled over?
Assuming we have the pattern for file naming when the file is rolled over...
Get all files matching the pattern and sort them in ascending order based on file last modified time
Iterate through the files and start with the one whose last modified time is greater than the time stored from the previous job run. Use stored line count smartly to skip the already read lines
Reset line count when you start with a new file thereafter
That's it!
You may need to put IF conditions at few places for some odd scenarios. One such scenario is when you are iterating through the files and if the file last modified time is exactly the same as the stored one, just reset the line count - so that it starts with the first line from the next/new file.
Sample code for subsequent job runs:
for(File file : files) {<BR>
if(file.lastModified() > storedLastModifiedTime) {<BR>
// you have the file to process, take care of the line count<BR>
} else if(file.lastModified() == storedLastModifiedTime) {<BR>
// reset stored line count<BR>
}<BR>
}<BR>
Upvotes: 2
Reputation: 1076
I'm trying to approach pretty much the same problem. It appears it is not as trivial as it might look at a first glance. You have to ignore the notion of EOF/EOS and you have to keep track of where in the log file you are.
I think the best approach is to have a separate thread to read log file. I did a test with BufferedReader
that is quite promising. The thread reads all the data up to the end of the file (where readLine()
returns null
) and goes to sleep for N seconds (5 in my case). Then after waking up tries again reading a line. If it returns String
, it goes on with processing. If it gets null
it goes to sleep again. It increments line counter on every successfull read and writes/reads it on stop/start, so it can locate last position in log file and proceed from that point.
The only problem with this approach is the N-second wait. It would be far more accurate, to have a way to tell Java "block on readLine()
regardless of EOF/EOS". With the N second wait you might be sleeping while data is already available. However the sleep seems to be necessary unless you want to eat up all the CPU power.
Upvotes: 0