margabit
margabit

Reputation: 2954

Get whole line where Regex matches

I have some multiline text and I want to find the lines that contain a specific word.

In the current implementation I only get the word, but instead I would like to get the whole line. Here's the code:

var finder = new Regex(@"(^|\W)" + Regex.Escape(wordToFind) + @"(\W|$)", RegexOptions.IgnoreCase);
 foreach (var match in finder.Matches(multilineString))
 {
      //match should be the whole line
 }

Example:

If Request.QueryString("bar") <> "" Then
    Set bar= foo("baz")
Else
    Set bar= foo("baz2")
End If

If I look for foo I should get:

Set bar= foo("baz")
Set bar= foo("baz2")

I didn't implement the regex and I'm not very familiar with Regular Expressions, I would appreciate if someone could give me some hints to keep investigating.

Thanks

Upvotes: 1

Views: 6368

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627468

Nolonar's solution does not take into account the possibility when a line starts or ends with a required word.

Moreover, you need to remember that ^ and $ anchors match the start/end of the whole string unless you pass the RegexOptions.Multiline option to make them match line boundaries.

Hence, the correct regex-only solution to extract all lines containing a whole word is

var finder = new Regex($@"^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w).*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
// Or, in order to avoid getting CR at the end of the extracted lines
// var finder = new Regex($@"^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w)[^\r\n]*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var results = finder.Matches(multilineString).Cast<Match>().Select(x => x.Value); // Use x.Value.Trim() to trim the result

Note you may "shrink" the code a bit by incorporating RegexOptions.IgnoreCase | RegexOptions.Multiline into the pattern itself using inline modifiers, (?im):

var finder = new Regex($@"(?im)^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w).*");
var finder = new Regex($@"(?im)^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w)[^\r\n]*"); 
                          ^^^^^

See the regex demo

Pattern details

  • ^ - start of a line
  • .*? - any 0+ chars other than a newline char, as few as possible (*? is a lazy, non-greedy quantifier)
  • (?<!\w) - the left-hand side word boundary
  • {Regex.Escape(wordToFind)} - an escaped version of a wordToFind string
  • (?!\w) - the right-hand side word boundary
  • .* - any 0+ chars other than a newline char, as many as possible (* is a greedy quantifier). NOTE: . matches a carriage return, \r, in .NET regex, hence my suggestion to .Trim() the extracted values. Or use [^\r\n]* instead to match 0 or more chars other than CR and LF.

Upvotes: 0

Nolonar
Nolonar

Reputation: 6132

You can try with this regex:

Regex regex = new Regex(@"^.*?\W" + Regex.Escape(wordToFind) + @"\W.*?$");

The ^ matches the start of the string or line, the $ at the end matches the end of string or line.
The .*? matches everything (but as little as possible), and \W (uppercase "W") matches any non-word character (characters that are neither a letter nor a digit).

Alternatively you can use \s (lowercase "s") instead of \W if you want your words to be separated by whitespaces only.

Here is a good reference for Regex.

Upvotes: 1

Ehsan
Ehsan

Reputation: 32719

You can do it like this

string[] lines = multilinestring.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
List<string> validString = new List<string>();
foreach(string s in lines)
{
   if(finder.Match(s).Success)
   {
      validString.Add(s);
   }
}

give this a try as well, should work

List<string> lines = multilinestring.Split(new string[] { Environment.NewLine }, StringSplitOptions.None).ToList();
List<string> validString =  lines.Where(x => finder.IsMatch(x)).ToList();

Upvotes: 0

Related Questions