Reputation: 702
I have a file, and I want to get the line of the file which matches a regex query.
My code is something like this:
Assembly assembly = typeof(EmbeddedResourceGetter).GetTypeInfo().Assembly;
Stream stream = assembly.GetManifestResourceStream(resourcePath);
StreamReader sr = new StreamReader(stream);
return file.ReadToEnd()
.Split('\n').ToList()
.Find(l => Regex.IsMatch(l, "regex-query-here"));
however, I feel like this is quite inefficient and if I need to repeat this multiple times, it can take a long time to complete.
So is there a more efficient way to get a line which matches a regex query without reading the whole file, or will I have to refactor my code in a different way to make it more efficient?
Upvotes: 0
Views: 1050
Reputation: 19149
Find
only gets the first match. so if you really want to get the first match dont read whole file. its inefficient. read the file line by line using File.ReadLines
Also using Regex.IsMatch
at every iteration is inefficient. create regex only once.
Regex regex = new Regex("regex-query-here");
return File.ReadLines(path).FirstOrDefault(l => regex.IsMatch(l));
File.ReadLines
loads only one line to memory at a time. FirstOrDefault
will stop iteration as soon as first match is found. so if your match is in 23rd line you will read only 23 lines from the file and you will get your result.
Reading all the file into memory may be faster but thats a trade off between memory and performance.
Another thing i have to mention is that splitting by \n
is not a cross-platform way to get lines.
Upvotes: 2
Reputation: 9657
You should read the file once, store it in a variable, because I/O operations are expensive. Then, run the regex on the variable.
When you read your file into a variable, you read it from hard disk to RAM, accessing RAM is fast, hard disk is slow. Without doubt best is to read from hard disk once!
Also reading line by line fails, if you want to match multiline pattern.
For example:
Can
you
match
me
if
you
read
me
line
by
line?
"Can\s+you" regex would fail to match in this case, because you won't get "Can" and "you" in same string.
Upvotes: 1