Reputation: 7666
This is tangentially related to an earlier question of mine.
Essentially, the solution in that question worked great, but now I need to adapt it to work in a much larger analysis application. Simply using StreamReader.ReadToEnd()
is not acceptable, since some of the files I will be reading in are very, very large. If there's been a mistake and someone forgot to clean up, they can theoretically be gigabytes big. Obviously I can't just read to the end of that.
Unfortunately, the normal read lines is also not acceptable, because some of the rows of data I am reading in contain stack traces - they obviously use /r/n
in their formatting. Ideally, I would like to tell the program to read forward until it hits a match for a regex, which it then returns. Is there any functionality to do this in .net? If not, can I get some suggestions for how I'd go about writing it?
Edit: To make it a bit easier to follow my question, here's a paste of some of the important parts of the adapted code:
foreach (var fileString in logpath.Select(log => new StreamReader(log)).Select(fileStream => fileStream.ReadToEnd()))
{
const string junkPattern = @"\[(?<junk>[0-9]*)\] \((?<userid>.{0,32})\)";
const string severityPattern = @"INFO|ERROR|FATAL";
const string datePattern = "^(?=[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})";
var records = Regex.Split(fileString, datePattern, RegexOptions.Multiline);
foreach (var record in records.Where(x => string.IsNullOrEmpty(x) == false))
......
The problem lies in the Foreach. .Select(fileStream => fileStream.ReadToEnd())
is gonna blow up memory badly, I just know it.
Upvotes: 5
Views: 2413
Reputation: 28366
First off all, you should move your const definition to class declaration - the compiler will do that for you, but this should be done by yourself, just for better code readability.
As @Blam mentioned, you should use StringBuilder and StreamReader.ReadLine in pair, something like this:
foreach(var filePath in logpath)
{
var sbRecord = new StringBuilder();
using(var reader = new StreamReader(filePath))
{
do
{
var line = reader.ReadLine();
// check start of the new record lines
if (Regex.Match(line, datePattern) && sbRecord.Length > 0)
{
// your method for log record
HandleRecord(sbRecord.ToString());
sbRecord.Clear();
sbRecord.AppendLine(line);
}
// if no lines were added or datePattern didn't hit
// append info about current record
else
{
sbRecord.AppendLine(line);
}
} while (!reader.EndOfStream)
}
}
If I didn't understand something about your problem, please clarify this in comment.
Also, you can use ThreadPool for schedule the tasks for your lines, just for speed of your application.
Upvotes: 1