user160820
user160820

Reputation: 15200

Text Search - Highlighting the Search phrase

What will be the best way to highligh the Searched pharase within a HTML Document.

I have Complete HTML Document as a Large String in a Variable. And I want to Highlight the searched term excluding text with Tags.

For example if the user searches for "img" the img tag should be ignored but phrase "img" within the text should be highlighted.

Upvotes: 1

Views: 610

Answers (5)

Ankit Jaiswal
Ankit Jaiswal

Reputation: 23427

You must be using some server side language to render the search results on the webpage.

So the best way I can think of is to highlight the word while rendering it using the server side language itself,which may be php,java or any other language.

This way you would have only the result strings without html and without parsing overhead.

Upvotes: 0

Christophe
Christophe

Reputation: 4828

there is a free javascript library that might help you out -> http://scott.yang.id.au/code/se-hilite/

Upvotes: 0

bobince
bobince

Reputation: 536329

Don't use regex.

Because regex cannot parse HTML (or even come close), any attempt to mess around with matching words in an HTML string risks breaking words that appear in markup. A badly-implemented HTML regex hack can even leave you with HTML-injection vulnerabilities which an attacker may be able to leverage to do cross-site-scripting.

Instead you should parse the HTML and do the searches on the text content only.

If you can accept a solution that adds the highlighting from JavaScript on the client side, this is really easy because the browser will already have parsed the HTML into a bunch of DOM objects you can manipulate. See eg. this question for a client-side example.

If you have to do it with PHP that's a bit more tricky. The simple solution would be to use DOMDocument::loadHTML and then translate the findText function from the above example into PHP. At least the DOM methods used are standardised so they work the same.

Upvotes: 1

jAndy
jAndy

Reputation: 235962

var highlight = function(what){
   var html  = document.body.innerHTML,

       word  = "(" + what + ")",
       match = new RegExp(word, "gi");

   html = html.replace(match, "<span style='background-color: red'>$1</span>");

   document.body.innerHTML = html;
};

highlight('ll');

This would highlight any occurence of 'll'.

Be carefull by calling highlight() with < or > or any tag name, it would also replace those, screwing up your markup. You might workaround that by reading innerText instead of innerHTML, but that way you'll lose the markup information.

Best way probably is to implement a parser routine yourself.

Example: http://www.jsfiddle.net/DRtVn/

Upvotes: 0

BjornS
BjornS

Reputation: 1024

Edit: This was tagged as Java before, so this answer might not be applicable.

This is quick and dirty but it might work for you, or at least be a starting point

private String highlight(String search,String html) {
    return html.replaceAll("(>[^<]*)("+search+")([^>]*<)","$1<em>$2</em>$3");
}

This requires testing, and I make no guarantees that its correct but the simplest way to explain how is that you ensure that your term exists between two tags and is thus is not itself a tag or part of a tag parameter.

Upvotes: 0

Related Questions