Reputation: 489
I wrote a method that highlights keywords in an HTML string. It returns the updated string and a list of the matched keywords. I would like to match the word if it appears as a whole word or with dashes. But in case it appears with dashes, the word including the dashes is highlighted and returned.
For example, if the word is locks
and the HTML contains He -locks- the door
then the dashes around the word are also highlighted:
He <span style=\"background-color:yellow\">-locks-</span> the door.
Instead of:
He -<span style=\"background-color:yellow\">locks</span>- the door.
In addition, the returned list contains -locks-
instead of locks
.
What can I do to get my expected result?
Here is my code:
private static List<string> FindKeywords(IEnumerable<string> words, bool bHighlight, ref string text)
{
HashSet<String> matchingKeywords = new HashSet<string>(new CaseInsensitiveComparer());
string allWords = "\\b(-)?(" + words.Aggregate((list, word) => list + "|" + word) + ")(-)?\\b";
Regex regex = new Regex(allWords, RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (Match match in regex.Matches(text))
{
matchingKeywords.Add(match.Value);
}
if (bHighlight)
{
text = regex.Replace(text, string.Format("<span style=\"background-color:yellow\">{0}</span>", "$0"));
}
return matchingKeywords.ToList();
}
Upvotes: 2
Views: 866
Reputation: 627101
You need to use captured .Groups[2].Value
instead of Match.Value
because your regex has 3 capturing groups, and the second one contains the keyword that you highlight:
foreach (Match match in regex.Matches(text))
{
matchingKeywords.Add(match.Groups[2].Value);
}
if (bHighlight)
{
text = regex.Replace(text, string.Format("$1<span style=\"background-color:yellow\">{0}</span>$3", "$2"));
}
match.Groups[2].Value
is used in the foreach
and then $2
is the backreference to the keyword captured in the regex.Replace
replacement string. $1
and $3
are the optional hyphens around the highlighted word (captured with (-)?
).
Upvotes: 2