dole
dole

Reputation: 433

Highlight keywords in a paragraph

I need to highlight a keyword in a paragraph, as google does in its search results. Let's assume that I have a MySQL db with blog posts. When a user searches for a certain keyword I wish to return the posts which contain those keywords, but to show only parts of the posts (the paragraph which contain the searched keyword) and to highlight those keywords.

My plan is this:

Can you help me with some logic, or at least to tell my if my logic is ok? I'm in a PHP learning stage.

Upvotes: 2

Views: 5390

Answers (8)

François Zaninotto
François Zaninotto

Reputation: 7335

Browsers have a native API for doing this client-side: the CSS Custom Highlight API. It requires a bit of JavaScript, but with a third-party library like highlight-search-term, it's a one-liner:

<script type="module">
import { highlightSearchTerm } from "https://cdn.jsdelivr.net/npm/[email protected]/src/index.js";
highlightSearchTerm({ search: 'KEYWORD',  selector: ".content" });
</script>

Put the above snippet at the end of the page body, replacing KEYWORD by your search keyword and .content with the CSS selector of the page element(s) where you want to highlight words.

More examples at https://www.npmjs.com/package/highlight-search-term

Upvotes: 0

Andrew Fox
Andrew Fox

Reputation: 890

I found this post when doing a search for how to highlight keyword search results. My requirements were:

  • Must be whole words
  • Must work for more than one keyword
  • Must be PHP only

I am fetching my data from a MySQL database, which doesn't contain elements, by design of the form which stores the data.

Here is the code I found most useful:

$keywords = array("fox","jump","quick");
$string = "The quick brown fox jumps over the lazy dog";
$test = "The quick brown fox jumps over the lazy dog"; // used to compare values at the end.

if(isset($keywords)) // For keyword search this will highlight all keywords in the results.
    {
    foreach($keywords as $word)
        {
        $pattern = "/\b".$word."\b/i";
        $string = preg_replace($pattern,"<span class=\"highlight\">".$word."</span>", $string);
        }
    }
 // We must compare the original string to the string altered in the loop to avoid having a string printed with no matches.
if($string === $test)
    {
    echo "No match";
    }
else
    {
    echo $string;
    }

Output:

The <span class="highlight">quick</span> brown <span class="highlight">fox</span> jumps over the lazy dog.

I hope this helps someone.

Upvotes: 1

Gumbo
Gumbo

Reputation: 655319

Here’s a solution for plain text:

$str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
$keywords = array('co');
$wordspan = 5;
$keywordsPattern = implode('|', array_map(function($val) { return preg_quote($val, '/'); }, $keywords));
$matches = preg_split("/($keywordsPattern)/ui", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
for ($i = 0, $n = count($matches); $i < $n; ++$i) {
    if ($i % 2 == 0) {
        $words = preg_split('/(\s+)/u', $matches[$i], -1, PREG_SPLIT_DELIM_CAPTURE);
        if (count($words) > ($wordspan+1)*2) {
            $matches[$i] = '…';
            if ($i > 0) {
                $matches[$i] = implode('', array_slice($words, 0, ($wordspan+1)*2)) . $matches[$i];
            }
            if ($i < $n-1) {
                $matches[$i] .= implode('', array_slice($words, -($wordspan+1)*2));
            }
        }
    } else {
        $matches[$i] = '<b>'.$matches[$i].'</b>';
    }
}
echo implode('', $matches);

With the current pattern "/($keywordsPattern)/ui" subwords are matched and highlighted. But you can change that if you want to:

  • If you want to match only whole words and not just subwords, use word boundaries \b:

    "/\b($keywordsPattern)\b/ui"
    
  • If you want to match subwords but highlight the whole word, use put optional word characters \w in front and after the keywords:

    "/(\w*?(?:$keywordsPattern)\w*)/ui"
    

Upvotes: 1

ircmaxell
ircmaxell

Reputation: 165201

If it contains html (note that this is a pretty robust solution):

$string = '<p>foo<b>bar</b></p>';
$keyword = 'foo';
$dom = new DomDocument();
$dom->loadHtml($string);
$xpath = new DomXpath($dom);
$elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
foreach ($elements as $element) {
    foreach ($element->childNodes as $child) {
        if (!$child instanceof DomText) continue;
        $fragment = $dom->createDocumentFragment();
        $text = $child->textContent;
        $stubs = array();
        while (($pos = stripos($text, $keyword)) !== false) {
            $fragment->appendChild(new DomText(substr($text, 0, $pos)));
            $word = substr($text, $pos, strlen($keyword));
            $highlight = $dom->createElement('span');
            $highlight->appendChild(new DomText($word));
            $highlight->setAttribute('class', 'highlight');
            $fragment->appendChild($highlight);
            $text = substr($text, $pos + strlen($keyword));
        }
        if (!empty($text)) $fragment->appendChild(new DomText($text));
        $element->replaceChild($fragment, $child);
    }
}
$string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);

Results in:

<p><span class="highlight">foo</span><b>bar</b></p>

And with:

$string = '<body><p>foobarbaz<b>bar</b></p></body>';
$keyword = 'bar';

You get (broken onto multiple lines for readability):

<p>foo
    <span class="highlight">bar</span>
    baz
    <b>
        <span class="highlight">bar</span>
    </b>
</p>

Beware of non-dom solutions (like regex or str_replace) since highlighting something like "div" has a tendency of completely destroying your HTML... This will only ever "highlight" strings in the body, never inside of a tag...


Edit Since you want Google style results, here's one way of doing it:

function getKeywordStubs($string, array $keywords, $maxStubSize = 10) {
    $dom = new DomDocument();
    $dom->loadHtml($string);
    $xpath = new DomXpath($dom);
    $results = array();
    $maxStubHalf = ceil($maxStubSize / 2);
    foreach ($keywords as $keyword) {
        $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
        $replace = '<span class="highlight">'.$keyword.'</span>';
        foreach ($elements as $element) {
            $stub = $element->textContent;
            $regex = '#^.*?((\w*\W*){'.
                 $maxStubHalf.'})('.
                 preg_quote($keyword, '#').
                 ')((\w*\W*){'.
                 $maxStubHalf.'}).*?$#ims';
            preg_match($regex, $stub, $match);
            var_dump($regex, $match);
            $stub = preg_replace($regex, '\\1\\3\\4', $stub);
            $stub = str_ireplace($keyword, $replace, $stub);
            $results[] = $stub;
        }
    }
    $results = array_unique($results);
    return $results;
}

Ok, so what that does is return an array of matches with $maxStubSize words around it (namely up to half that number before, and half after)...

So, given a string:

<p>a whole 
    <b>bunch of</b> text 
    <a>here for</a> 
    us to foo bar baz replace out from this string
    <b>bar</b>
</p>

Calling getKeywordStubs($string, array('bar', 'bunch')) will result in:

array(4) {
  [0]=>
  string(75) "here for us to foo <span class="highlight">bar</span> baz replace out from "
  [3]=>
  string(34) "<span class="highlight">bar</span>"
  [4]=>
  string(62) "a whole <span class="highlight">bunch</span> of text here for "
  [7]=>
  string(39) "<span class="highlight">bunch</span> of"
}

So, then you could build your result blurb by sorting the list by strlen and then picking the two longest matches... (assuming php 5.3+):

usort($results, function($str1, $str2) { 
    return strlen($str2) - strlen($str1);
});
$description = implode('...', array_slice($results, 0, 2));

Which results in:

here for us to foo <span class="highlight">bar</span> baz replace out...a whole <span class="highlight">bunch</span> of text here for 

I hope that helps... (I do feel this is a bit... bloated... I'm sure there are better ways to do this, but here's one way)...

Upvotes: 9

kevtrout
kevtrout

Reputation: 4984

You could try exploding your database search result set into an array using explode and then usearray_search() on each search result. Set the $distance variable in the example below to how many words you'd like to appear on either side of the first match of the $keyword.

In the example, I've included lorum ipsum text as an example database result paragraph and set the $keyword to 'scelerisque'. You'd obviously replace these in your code.

//example paragraph text
$lorum = 'Nunc nec magna at nibh imperdiet dignissim quis eu velit. 
vel mattis odio rutrum nec. Etiam sit amet tortor nibh, molestie 
vestibulum tortor. Integer condimentum magna dictum purus vehicula 
et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero, 
tristique et pellentesque sed, mattis eget dui. Cum sociis natoque 
penatibus et magnis dis parturient montes, nascetur ridiculus mus. 
.';

//turn paragraph into array
$ipsum = explode(' ',$lorum);
//set keyword
$keyword = 'scelerisque';
//set excerpt distance
$distance = 10;

//look for keyword in paragraph array, return array key of first match
$match_key = array_search($keyword,$ipsum);

if(!empty($match_key)){

    foreach($ipsum as $key=>$value){
        //if paragraph array key inside excerpt distance
        if($key > $match_key-$distance and $key< $match_key+$distance){ 
            //if array key matches keyword key, bold the word
            if($key == $match_key){
                $word = '<b>'.$value.'</b>';
                }
            else{
                $word = $value;
                }
            //create excerpt array to hold words within distance
            $excerpt[] = $word;
            }

        }
    //turn excerpt array into a string
    $excerpt = implode(' ',$excerpt);
    }
//print the string
echo $excerpt;

$excerpt returns: "vestibulum tortor. Integer condimentum magna dictum purus vehicula et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero,"

Upvotes: 1

shmeeps
shmeeps

Reputation: 7833

If you wish to cut out the relevant paragraphs, after doing the above mentions str_replace function, you can use stripos() to find the position of these strong sections, and use an offset of that location with substr() to cut out a section of the paragraph, such as:

$searchterms;

foreach($searchterms as $search)
{
$paragraph = str_replace($search, "<strong>$search</strong>", $paragraph);
}

$pos = 0;

for($i = 0; $i < 4; $i++)  
{  
$pos = stripos($paragraph, "<strong>", $pos);  
$section[$i] = substr($paragraph, $pos - 100, 200);
}

which will give you an array of small sentences (200 characters each) to use how you wish. It may also be beneficial to search for the nearest space from the cutting locations, and cut from there to prevent half-words. Oh, and you also need to check for errors, but I'll leave that but up to you.

Upvotes: 2

thejh
thejh

Reputation: 45578

Maybe you could do something like this when you're connected to the database:

$keyword = $_REQUEST["keyword"]; //fetch the keyword from the request
$result = mysql_query("SELECT * FROM `posts` WHERE `content` LIKE '%".
        mysql_real_escape_string($keyword)."%'"); //ask the database for the posttexts
while ($row = mysql_fetch_array($result)) {//do the following for each result:
  $text = $row["content"];//we're only interested in the content at the moment
  $text=substr ($text, strrpos($text, $keyword)-150, 300); //cut out
  $text=str_replace($keyword, '<strong>'.$keyword.'</strong>', $text); //highlight
  echo htmlentities($text); //print it
  echo "<hr>";//draw a line under it
}

Upvotes: 2

Luciano Mammino
Luciano Mammino

Reputation: 811

If you're a beginner this will not be super easy as someone might think...

I think you should do the following steps:

  1. build a query based on what user searched (beware of sql injections)
  2. fetch the results and organize them (an array should be fine)
  3. build the html code from the previous array

In the third step you can use some regular expression to replace the user searched keywords with a bolded equivalent. str_replace could work too...

I hope this helps... If you could provide your database structure maybe I can give you some more precise hints...

Upvotes: 0

Related Questions