Reputation: 39394
I am reading a list of files from a directory and looking for the patterns:
A. [[[Something]]] > Get the string "Something"
B. [[[Something///Comment]]] > Get the strings "Something" and "Comment"
C. [[[Enter between %0 and %1 characters|||Val 1|||Val 2]]] >> Get the string before the first ||| which is "Enter between %0 and %1 characters"
So I tried the following:
IList<String> files = Directory.GetFiles(path, "*.cshtml", SearchOption.AllDirectories).ToList();
IDictionary<String, Tuple<Int32, String>> items = new Dictionary<String, Tuple<Int32, String>>();
Regex regex = new Regex(@"\[\[\[.*\]\]\]");
foreach (String file in files) {
foreach (String line in File.ReadAllLines(file)) {
MatchCollection matches = regex.Matches(line);
foreach (Match match in matches) {
if (match != null) {
items.Add(match.Value, new Tuple<Int32, String>(number, file));
}
}
}
}
NOTE: I am using ReadAllLines because I need to get the line number of each match I find.
Could I have some help in the following:
When using the Regex @"[[[.*]]]" I found a situation where is does not work:
ViewInfo.Title("[[[Title]]]").Description("[[[Description]]]");
I get Title]]]").Description("[[[Description]]]
I have not been able to apply the Rules (B) and (C).
Is it possible to improve performance or my code is ok?
Upvotes: 0
Views: 372
Reputation: 51330
You need an ungreedy expression: .*?
will try to consume as few characters as possible .
Try this: @"\[\[\[(?:(.*?)\|\|\|.*?|(.*?)///(.*?)|(.*?))\]\]\]"
(it is important to put the longest possible alternatives first or .*?
could just eat up the whole string)
Use File.ReadLines
along with a variable you'll increment at each iteration for counting lines. That way you won't have to hold the whole file in memory.
Upvotes: 1