Reputation: 387
I am currently developing an application that reads a text file of about 50000 lines. For each line, I need to check if it contains a specific String.
At the moment, I use the conventional System.IO.StreamReader
to read my file line by line.
The problem is that the size of the text file changes each time. I made several test performance and I noticed that when the file size increase, the more time it will take to read a line.
Reading a txt file that contains 5000 lines : 0:40
Reading a txt file that contains 10000 lines : 2:54
It take 4 times longer to read a file 2 times larger. I can't imagine how much time it will takes to read a 100000 lines file.
Here's my code :
using (StreamReader streamReader = new StreamReader(this.MyPath))
{
while (streamReader.Peek() > 0)
{
string line = streamReader.ReadLine();
if (line.Contains(Resources.Constants.SpecificString)
{
// Do some action with the string.
}
}
}
Is there a way to avoid the situation: bigger File = more time to read a single line?
Upvotes: 2
Views: 5222
Reputation: 1422
Use RegEx.IsMatch
and you should see some performance improvements.
using (StreamReader streamReader = new StreamReader(this.MyPath))
{
var regEx = new Regex(MyPattern, RegexOptions.Compiled);
while (streamReader.Peek() > 0)
{
string line = streamReader.ReadLine();
if (regEx.IsMatch(line))
{
// Do some action with the string.
}
}
}
Please remember to use a compiled RegEx, however. Here's a pretty good article with some benchmarks you can look at.
Happy coding!
Upvotes: 0
Reputation: 726569
Try this:
var toSearch = Resources.Constants.SpecificString;
foreach (var str in File.ReadLines(MyPath).Where(s => s.Contains(toSearch))) {
// Do some action with the string
}
This avoids accessing the resources on each iteration by caching value before the loop. If this does not help, try writing your own Contains
based on an advanced string searching algorithm, such as the KMP.
Note: be sure to use File.ReadLines which reads lines lazily (unlike similarly looking File.ReadAllLines
that reads all lines at once).
Upvotes: 7