Reputation: 1348
Scenario - 150MB text file which is the exported Inbox of an old email account. Need to parse through and pull out emails from a specific user and writes these to a new, single file. I have code that works, its just dogged slow.
I'm using marker strings to search for where to begin/end the copy from the original file.
Here's the main function:
StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt");
string working = string.Empty;
string mystring = string.Empty;
while (!sr.EndOfStream)
{
while ((mystring = sr.ReadLine()) != null)
{
if (mystring == strBeginMarker)
{
writeLog(mystring);
//read the next line
working = sr.ReadLine();
while( !(working.StartsWith(strEndMarker)))
{
writeLog(working);
working = sr.ReadLine();
}
}
}
}
this.Text = "DONE!!";
sr.Close();
The function that writes the selected messages to the new file:
public void writeLog(string sMessage)
{
fw = new System.IO.StreamWriter(path, true);
fw.WriteLine(sMessage);
fw.Flush();
fw.Close();
}
Again, this process works. I get a good output file, it just takes a long time and I'm sure there are ways to make this faster.
Upvotes: 13
Views: 5190
Reputation: 10823
I would just do a simple parser. Note that this assumes (as you do in your code above) that the markers are in fact unique.
You may have to play with the formatting a bit of your output, but here is the general idea:
// Read the entire file and close it
using (StreamReader sr = new
StreamReader("c:\\Thunderbird_Inbox.txt");)
{
string data = sr.ReadToEnd();
}
string newData = "";
int position = data.IndexOf(strBeginMarker);
while (position > 0)
{
int endPosition = data.IndexOf(endMarker, position);
int markerLength = position + strBeginMarker.Length;
newData += data.Substring(markerLength, endPosition - markerLength);
position = data.IndexOf(strBeginMarker, position+ endStr.Length);
}
writeLog(newData);
(Note that I don't have a 150 MB file to test this on - YMMV depending on the machine you are using).
Upvotes: 2
Reputation: 564851
The largest optimization would be to change your writeLog method to open the file once at the beginning of this operation, write to it many times, then close it at the end.
Right now, you're opening and closing the file each iteration where you write, which is going to definitely slow things down.
Try the following:
// Open this once at the beginning!
using(fw = new System.IO.StreamWriter(path, true))
{
using(StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt"))
{
string working;
string mystring;
while ((mystring = sr.ReadLine()) != null)
{
if (mystring == strBeginMarker)
{
writeLog(mystring);
//read the next line
working = sr.ReadLine();
while( !(working.StartsWith(strEndMarker)))
{
fw.WriteLine(working);
working = sr.ReadLine();
}
}
}
}
}
this.Text = "DONE!!";
Upvotes: 19
Reputation: 2130
I think you should:
Upvotes: 2
Reputation: 3113
You could simply declare the StreamWriter object outside of that while
loop and just write the line to it inside the loop.
Like this:
StreamWriter sw = new StreamWriter(path, true);
while
{
// ...
while( !(working.StartsWith(strEndMarker)))
{
sw.WriteLine(working);
working = sr.ReadLine();
}
}
Upvotes: 0
Reputation: 6050
I do not have a 150MB text file to test, but if your server has the memory would Reading the hold thing into a string and doing a RegEx pulling out the message work?
Upvotes: 0