Scho
Scho

Reputation: 371

Highlight multiple keywords in a string ignoring the added HTML in C#

I have an extension that loops through a string to find all instances of any number of keywords (or search terms). When it finds a match, it adds a span tag around each keyword to highlight the keywords on display.

        public static string HighlightKeywords( this string input, string keywords )
    {
        if( input == String.Empty || keywords == String.Empty )
        {
            return input;
        }

        string[] words = keywords.Split( new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries );

        foreach( string word in words )
        {
            input = Regex.Replace( input, word, string.Format( "<span class=\"highlight\">{0}</span>", "$0" ), RegexOptions.IgnoreCase );
        }
        return input;
    }

The method works well except when you use a search term that matches the added span tag.

Example of dodgy output:

The string "The class is high"

The keywords: "class high"

Resulting dodgy HTML output: input = "The <span class='highlight'>classspan> is high"

So it is looks for the first keyword in the original string, adds the decorating HTML, then looks for the next keyword in the altered string, adds more HTML and creates a mess.

Is there any way to avoid the decorated keywords when searching for each keyword?

UPDATE:

Given that case-insensitivity is important, I explored various case insensitive replace methods with partial success. The search function worked by ignoring case, but returned the casing used in the keywords and substituted it into the original text e.g. a search for "HIGH" returns "The class is HIGH". This just looks bad.

So, I returned to using Regex (sigh). I managed to rewrite my extension as follows, which seems to work very well but I wonder how efficient this extension really is. I welcome any comments on improving this code or achieving this without Regex.

    public static string HighlightKeywords( this string input, string keywords, string classname )
    {
        if( input == String.Empty || keywords == String.Empty )
        {
            return input;
        }

        string[] words = keywords.Split( new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries );

        foreach( string word in words )
        {
            input = Regex.Replace( input, Regex.Escape( word ), string.Format( "<!--{0}-->", Regex.Unescape( "$0" ) ), RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled );
        }

        var s = new StringBuilder( );
        s.Append( input );
        s.Replace( "<!--", "<span class='" + classname + "'>" ).Replace( "-->", "</span>" );

        return s.ToString( );
    }

Upvotes: 2

Views: 788

Answers (2)

MysteriousLab
MysteriousLab

Reputation: 394

Little different approach. Adding StringBuilder would be better!

  public static string HighlightKeywords(this string input, string keywords)
{
  if (input == String.Empty || keywords == String.Empty)
  {
    return input;
  }

  string[] words = keywords.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Select(x => x.ToLower()).ToArray();
  string[] originalWords = input.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
  input = string.Empty;

  foreach (var word in originalWords.Select((value, i) => new { i, value }))
  {
    input += words.Contains(word.value.ToLower()) ? string.Format("<span class=\"highlight\">{0}</span>", word.value) : word.value;
    if (originalWords.Length - 1 != word.i) input += " ";
  }
  return input;
}

Upvotes: -1

Enigmativity
Enigmativity

Reputation: 117134

Try this simple change:

public static string HighlightKeywords(this string input, string keywords)
{
    if (input == String.Empty || keywords == String.Empty)
    {
        return input;
    }

    return Regex.Replace(
        input,
        String.Join("|", keywords.Split(' ').Select(x => Regex.Escape(x))),
        string.Format("<span class=\"highlight\">{0}</span>", "$0"),
        RegexOptions.IgnoreCase);
}

Let Regex do the work for you.

With your input "The class is high".HighlightKeywords("class high") you get "The <span class="highlight">class</span> is <span class="highlight">high</span>" out.

Upvotes: 3

Related Questions