Reputation: 2954
I have some multiline text and I want to find the lines that contain a specific word.
In the current implementation I only get the word, but instead I would like to get the whole line. Here's the code:
var finder = new Regex(@"(^|\W)" + Regex.Escape(wordToFind) + @"(\W|$)", RegexOptions.IgnoreCase);
foreach (var match in finder.Matches(multilineString))
{
//match should be the whole line
}
Example:
If Request.QueryString("bar") <> "" Then
Set bar= foo("baz")
Else
Set bar= foo("baz2")
End If
If I look for foo
I should get:
Set bar= foo("baz")
Set bar= foo("baz2")
I didn't implement the regex and I'm not very familiar with Regular Expressions, I would appreciate if someone could give me some hints to keep investigating.
Thanks
Upvotes: 1
Views: 6368
Reputation: 627468
Nolonar's solution does not take into account the possibility when a line starts or ends with a required word.
Moreover, you need to remember that ^
and $
anchors match the start/end of the whole string unless you pass the RegexOptions.Multiline
option to make them match line boundaries.
Hence, the correct regex-only solution to extract all lines containing a whole word is
var finder = new Regex($@"^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w).*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
// Or, in order to avoid getting CR at the end of the extracted lines
// var finder = new Regex($@"^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w)[^\r\n]*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var results = finder.Matches(multilineString).Cast<Match>().Select(x => x.Value); // Use x.Value.Trim() to trim the result
Note you may "shrink" the code a bit by incorporating RegexOptions.IgnoreCase | RegexOptions.Multiline
into the pattern itself using inline modifiers, (?im)
:
var finder = new Regex($@"(?im)^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w).*");
var finder = new Regex($@"(?im)^.*?(?<!\w){Regex.Escape(wordToFind)}(?!\w)[^\r\n]*");
^^^^^
See the regex demo
Pattern details
^
- start of a line.*?
- any 0+ chars other than a newline char, as few as possible (*?
is a lazy, non-greedy quantifier)(?<!\w)
- the left-hand side word boundary{Regex.Escape(wordToFind)}
- an escaped version of a wordToFind
string(?!\w)
- the right-hand side word boundary.*
- any 0+ chars other than a newline char, as many as possible (*
is a greedy quantifier). NOTE: .
matches a carriage return, \r
, in .NET regex, hence my suggestion to .Trim()
the extracted values. Or use [^\r\n]*
instead to match 0 or more chars other than CR and LF.Upvotes: 0
Reputation: 6132
You can try with this regex:
Regex regex = new Regex(@"^.*?\W" + Regex.Escape(wordToFind) + @"\W.*?$");
The ^
matches the start of the string or line, the $
at the end matches the end of string or line.
The .*?
matches everything (but as little as possible), and \W
(uppercase "W") matches any non-word character (characters that are neither a letter nor a digit).
Alternatively you can use \s
(lowercase "s") instead of \W
if you want your words to be separated by whitespaces only.
Here is a good reference for Regex.
Upvotes: 1
Reputation: 32719
You can do it like this
string[] lines = multilinestring.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
List<string> validString = new List<string>();
foreach(string s in lines)
{
if(finder.Match(s).Success)
{
validString.Add(s);
}
}
give this a try as well, should work
List<string> lines = multilinestring.Split(new string[] { Environment.NewLine }, StringSplitOptions.None).ToList();
List<string> validString = lines.Where(x => finder.IsMatch(x)).ToList();
Upvotes: 0