Miguel Moura
Miguel Moura

Reputation: 39394

Find Matches using Regex in File Lines

I am reading a list of files from a directory and looking for the patterns:

A. [[[Something]]] > Get the string "Something"

B. [[[Something///Comment]]] > Get the strings "Something" and "Comment"

C. [[[Enter between %0 and %1 characters|||Val 1|||Val 2]]] >> Get the string before the first ||| which is "Enter between %0 and %1 characters"

So I tried the following:

IList<String> files = Directory.GetFiles(path, "*.cshtml", SearchOption.AllDirectories).ToList();

IDictionary<String, Tuple<Int32, String>> items = new Dictionary<String, Tuple<Int32, String>>();

Regex regex = new Regex(@"\[\[\[.*\]\]\]");

foreach (String file in files) {

  foreach (String line in File.ReadAllLines(file)) {

    MatchCollection matches = regex.Matches(line);

    foreach (Match match in matches) {

      if (match != null) {
        items.Add(match.Value, new Tuple<Int32, String>(number, file));
      }

    }

  }

}

NOTE: I am using ReadAllLines because I need to get the line number of each match I find.

Could I have some help in the following:

  1. When using the Regex @"[[[.*]]]" I found a situation where is does not work:

    ViewInfo.Title("[[[Title]]]").Description("[[[Description]]]");

    I get Title]]]").Description("[[[Description]]]

  2. I have not been able to apply the Rules (B) and (C).

  3. Is it possible to improve performance or my code is ok?

Upvotes: 0

Views: 372

Answers (1)

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

  1. You need an ungreedy expression: .*? will try to consume as few characters as possible .

  2. Try this: @"\[\[\[(?:(.*?)\|\|\|.*?|(.*?)///(.*?)|(.*?))\]\]\]" (it is important to put the longest possible alternatives first or .*? could just eat up the whole string)

  3. Use File.ReadLines along with a variable you'll increment at each iteration for counting lines. That way you won't have to hold the whole file in memory.

Upvotes: 1

Related Questions