Reputation: 433
I need to highlight a keyword in a paragraph, as google does in its search results. Let's assume that I have a MySQL db with blog posts. When a user searches for a certain keyword I wish to return the posts which contain those keywords, but to show only parts of the posts (the paragraph which contain the searched keyword) and to highlight those keywords.
My plan is this:
Can you help me with some logic, or at least to tell my if my logic is ok? I'm in a PHP learning stage.
Upvotes: 2
Views: 5390
Reputation: 7335
Browsers have a native API for doing this client-side: the CSS Custom Highlight API. It requires a bit of JavaScript, but with a third-party library like highlight-search-term
, it's a one-liner:
<script type="module">
import { highlightSearchTerm } from "https://cdn.jsdelivr.net/npm/[email protected]/src/index.js";
highlightSearchTerm({ search: 'KEYWORD', selector: ".content" });
</script>
Put the above snippet at the end of the page body, replacing KEYWORD by your search keyword and .content
with the CSS selector of the page element(s) where you want to highlight words.
More examples at https://www.npmjs.com/package/highlight-search-term
Upvotes: 0
Reputation: 890
I found this post when doing a search for how to highlight keyword search results. My requirements were:
I am fetching my data from a MySQL
database, which doesn't contain elements, by design of the form which stores the data.
Here is the code I found most useful:
$keywords = array("fox","jump","quick");
$string = "The quick brown fox jumps over the lazy dog";
$test = "The quick brown fox jumps over the lazy dog"; // used to compare values at the end.
if(isset($keywords)) // For keyword search this will highlight all keywords in the results.
{
foreach($keywords as $word)
{
$pattern = "/\b".$word."\b/i";
$string = preg_replace($pattern,"<span class=\"highlight\">".$word."</span>", $string);
}
}
// We must compare the original string to the string altered in the loop to avoid having a string printed with no matches.
if($string === $test)
{
echo "No match";
}
else
{
echo $string;
}
Output:
The <span class="highlight">quick</span> brown <span class="highlight">fox</span> jumps over the lazy dog.
I hope this helps someone.
Upvotes: 1
Reputation: 655319
Here’s a solution for plain text:
$str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
$keywords = array('co');
$wordspan = 5;
$keywordsPattern = implode('|', array_map(function($val) { return preg_quote($val, '/'); }, $keywords));
$matches = preg_split("/($keywordsPattern)/ui", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
for ($i = 0, $n = count($matches); $i < $n; ++$i) {
if ($i % 2 == 0) {
$words = preg_split('/(\s+)/u', $matches[$i], -1, PREG_SPLIT_DELIM_CAPTURE);
if (count($words) > ($wordspan+1)*2) {
$matches[$i] = '…';
if ($i > 0) {
$matches[$i] = implode('', array_slice($words, 0, ($wordspan+1)*2)) . $matches[$i];
}
if ($i < $n-1) {
$matches[$i] .= implode('', array_slice($words, -($wordspan+1)*2));
}
}
} else {
$matches[$i] = '<b>'.$matches[$i].'</b>';
}
}
echo implode('', $matches);
With the current pattern "/($keywordsPattern)/ui"
subwords are matched and highlighted. But you can change that if you want to:
If you want to match only whole words and not just subwords, use word boundaries \b
:
"/\b($keywordsPattern)\b/ui"
If you want to match subwords but highlight the whole word, use put optional word characters \w
in front and after the keywords:
"/(\w*?(?:$keywordsPattern)\w*)/ui"
Upvotes: 1
Reputation: 165201
If it contains html (note that this is a pretty robust solution):
$string = '<p>foo<b>bar</b></p>';
$keyword = 'foo';
$dom = new DomDocument();
$dom->loadHtml($string);
$xpath = new DomXpath($dom);
$elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
foreach ($elements as $element) {
foreach ($element->childNodes as $child) {
if (!$child instanceof DomText) continue;
$fragment = $dom->createDocumentFragment();
$text = $child->textContent;
$stubs = array();
while (($pos = stripos($text, $keyword)) !== false) {
$fragment->appendChild(new DomText(substr($text, 0, $pos)));
$word = substr($text, $pos, strlen($keyword));
$highlight = $dom->createElement('span');
$highlight->appendChild(new DomText($word));
$highlight->setAttribute('class', 'highlight');
$fragment->appendChild($highlight);
$text = substr($text, $pos + strlen($keyword));
}
if (!empty($text)) $fragment->appendChild(new DomText($text));
$element->replaceChild($fragment, $child);
}
}
$string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
Results in:
<p><span class="highlight">foo</span><b>bar</b></p>
And with:
$string = '<body><p>foobarbaz<b>bar</b></p></body>';
$keyword = 'bar';
You get (broken onto multiple lines for readability):
<p>foo
<span class="highlight">bar</span>
baz
<b>
<span class="highlight">bar</span>
</b>
</p>
Beware of non-dom solutions (like regex
or str_replace
) since highlighting something like "div" has a tendency of completely destroying your HTML... This will only ever "highlight" strings in the body, never inside of a tag...
Edit Since you want Google style results, here's one way of doing it:
function getKeywordStubs($string, array $keywords, $maxStubSize = 10) {
$dom = new DomDocument();
$dom->loadHtml($string);
$xpath = new DomXpath($dom);
$results = array();
$maxStubHalf = ceil($maxStubSize / 2);
foreach ($keywords as $keyword) {
$elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
$replace = '<span class="highlight">'.$keyword.'</span>';
foreach ($elements as $element) {
$stub = $element->textContent;
$regex = '#^.*?((\w*\W*){'.
$maxStubHalf.'})('.
preg_quote($keyword, '#').
')((\w*\W*){'.
$maxStubHalf.'}).*?$#ims';
preg_match($regex, $stub, $match);
var_dump($regex, $match);
$stub = preg_replace($regex, '\\1\\3\\4', $stub);
$stub = str_ireplace($keyword, $replace, $stub);
$results[] = $stub;
}
}
$results = array_unique($results);
return $results;
}
Ok, so what that does is return an array of matches with $maxStubSize
words around it (namely up to half that number before, and half after)...
So, given a string:
<p>a whole
<b>bunch of</b> text
<a>here for</a>
us to foo bar baz replace out from this string
<b>bar</b>
</p>
Calling getKeywordStubs($string, array('bar', 'bunch'))
will result in:
array(4) {
[0]=>
string(75) "here for us to foo <span class="highlight">bar</span> baz replace out from "
[3]=>
string(34) "<span class="highlight">bar</span>"
[4]=>
string(62) "a whole <span class="highlight">bunch</span> of text here for "
[7]=>
string(39) "<span class="highlight">bunch</span> of"
}
So, then you could build your result blurb by sorting the list by strlen
and then picking the two longest matches... (assuming php 5.3+):
usort($results, function($str1, $str2) {
return strlen($str2) - strlen($str1);
});
$description = implode('...', array_slice($results, 0, 2));
Which results in:
here for us to foo <span class="highlight">bar</span> baz replace out...a whole <span class="highlight">bunch</span> of text here for
I hope that helps... (I do feel this is a bit... bloated... I'm sure there are better ways to do this, but here's one way)...
Upvotes: 9
Reputation: 4984
You could try exploding your database search result set into an array using explode
and then usearray_search()
on each search result. Set the $distance
variable in the example below to how many words you'd like to appear on either side of the first match of the $keyword
.
In the example, I've included lorum ipsum text as an example database result paragraph and set the $keyword
to 'scelerisque'. You'd obviously replace these in your code.
//example paragraph text
$lorum = 'Nunc nec magna at nibh imperdiet dignissim quis eu velit.
vel mattis odio rutrum nec. Etiam sit amet tortor nibh, molestie
vestibulum tortor. Integer condimentum magna dictum purus vehicula
et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero,
tristique et pellentesque sed, mattis eget dui. Cum sociis natoque
penatibus et magnis dis parturient montes, nascetur ridiculus mus.
.';
//turn paragraph into array
$ipsum = explode(' ',$lorum);
//set keyword
$keyword = 'scelerisque';
//set excerpt distance
$distance = 10;
//look for keyword in paragraph array, return array key of first match
$match_key = array_search($keyword,$ipsum);
if(!empty($match_key)){
foreach($ipsum as $key=>$value){
//if paragraph array key inside excerpt distance
if($key > $match_key-$distance and $key< $match_key+$distance){
//if array key matches keyword key, bold the word
if($key == $match_key){
$word = '<b>'.$value.'</b>';
}
else{
$word = $value;
}
//create excerpt array to hold words within distance
$excerpt[] = $word;
}
}
//turn excerpt array into a string
$excerpt = implode(' ',$excerpt);
}
//print the string
echo $excerpt;
$excerpt
returns:
"vestibulum tortor. Integer condimentum magna dictum purus vehicula et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero,"
Upvotes: 1
Reputation: 7833
If you wish to cut out the relevant paragraphs, after doing the above mentions str_replace function, you can use stripos() to find the position of these strong sections, and use an offset of that location with substr() to cut out a section of the paragraph, such as:
$searchterms; foreach($searchterms as $search) { $paragraph = str_replace($search, "<strong>$search</strong>", $paragraph); } $pos = 0; for($i = 0; $i < 4; $i++) { $pos = stripos($paragraph, "<strong>", $pos); $section[$i] = substr($paragraph, $pos - 100, 200); }
which will give you an array of small sentences (200 characters each) to use how you wish. It may also be beneficial to search for the nearest space from the cutting locations, and cut from there to prevent half-words. Oh, and you also need to check for errors, but I'll leave that but up to you.
Upvotes: 2
Reputation: 45578
Maybe you could do something like this when you're connected to the database:
$keyword = $_REQUEST["keyword"]; //fetch the keyword from the request
$result = mysql_query("SELECT * FROM `posts` WHERE `content` LIKE '%".
mysql_real_escape_string($keyword)."%'"); //ask the database for the posttexts
while ($row = mysql_fetch_array($result)) {//do the following for each result:
$text = $row["content"];//we're only interested in the content at the moment
$text=substr ($text, strrpos($text, $keyword)-150, 300); //cut out
$text=str_replace($keyword, '<strong>'.$keyword.'</strong>', $text); //highlight
echo htmlentities($text); //print it
echo "<hr>";//draw a line under it
}
Upvotes: 2
Reputation: 811
If you're a beginner this will not be super easy as someone might think...
I think you should do the following steps:
In the third step you can use some regular expression to replace the user searched keywords with a bolded equivalent. str_replace could work too...
I hope this helps... If you could provide your database structure maybe I can give you some more precise hints...
Upvotes: 0