Reputation: 371
I have an extension that loops through a string to find all instances of any number of keywords (or search terms). When it finds a match, it adds a span tag around each keyword to highlight the keywords on display.
public static string HighlightKeywords( this string input, string keywords )
{
if( input == String.Empty || keywords == String.Empty )
{
return input;
}
string[] words = keywords.Split( new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries );
foreach( string word in words )
{
input = Regex.Replace( input, word, string.Format( "<span class=\"highlight\">{0}</span>", "$0" ), RegexOptions.IgnoreCase );
}
return input;
}
The method works well except when you use a search term that matches the added span tag.
Example of dodgy output:
The string "The class is high"
The keywords: "class high"
Resulting dodgy HTML output: input = "The <span class='highlight'>classspan> is high"
So it is looks for the first keyword in the original string, adds the decorating HTML, then looks for the next keyword in the altered string, adds more HTML and creates a mess.
Is there any way to avoid the decorated keywords when searching for each keyword?
UPDATE:
Given that case-insensitivity is important, I explored various case insensitive replace methods with partial success. The search function worked by ignoring case, but returned the casing used in the keywords and substituted it into the original text e.g. a search for "HIGH" returns "The class is HIGH". This just looks bad.
So, I returned to using Regex (sigh). I managed to rewrite my extension as follows, which seems to work very well but I wonder how efficient this extension really is. I welcome any comments on improving this code or achieving this without Regex.
public static string HighlightKeywords( this string input, string keywords, string classname )
{
if( input == String.Empty || keywords == String.Empty )
{
return input;
}
string[] words = keywords.Split( new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries );
foreach( string word in words )
{
input = Regex.Replace( input, Regex.Escape( word ), string.Format( "<!--{0}-->", Regex.Unescape( "$0" ) ), RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled );
}
var s = new StringBuilder( );
s.Append( input );
s.Replace( "<!--", "<span class='" + classname + "'>" ).Replace( "-->", "</span>" );
return s.ToString( );
}
Upvotes: 2
Views: 788
Reputation: 394
Little different approach. Adding StringBuilder would be better!
public static string HighlightKeywords(this string input, string keywords)
{
if (input == String.Empty || keywords == String.Empty)
{
return input;
}
string[] words = keywords.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Select(x => x.ToLower()).ToArray();
string[] originalWords = input.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
input = string.Empty;
foreach (var word in originalWords.Select((value, i) => new { i, value }))
{
input += words.Contains(word.value.ToLower()) ? string.Format("<span class=\"highlight\">{0}</span>", word.value) : word.value;
if (originalWords.Length - 1 != word.i) input += " ";
}
return input;
}
Upvotes: -1
Reputation: 117134
Try this simple change:
public static string HighlightKeywords(this string input, string keywords)
{
if (input == String.Empty || keywords == String.Empty)
{
return input;
}
return Regex.Replace(
input,
String.Join("|", keywords.Split(' ').Select(x => Regex.Escape(x))),
string.Format("<span class=\"highlight\">{0}</span>", "$0"),
RegexOptions.IgnoreCase);
}
Let Regex
do the work for you.
With your input "The class is high".HighlightKeywords("class high")
you get "The <span class="highlight">class</span> is <span class="highlight">high</span>"
out.
Upvotes: 3