user2096091
user2096091

Reputation: 105

Using preg_replace to get characters before/after match

I have the following code to get characters before/after the regex match:

$searchterm = 'blue';
$string = 'Here is a sentence talking about blue.  This sentence talks about red.';
$regex = '/.*(.{10}\b' . $searchterm . '\b.{10}).*/si';
echo preg_replace($regex, '$1', $string);

Output: "ing about blue. This se" (expected).

When I change $searchterm = 'red', then I get this:

Output: "Here is a sentence talking about blue. This sentence talks about red."

I am expecting this: "lks about red." The same thing happens if you start at the beginning of the sentence. Is there a way to use a similar regex to not pull back the entire string when it's at the start/end?

Example of what is happening: https://sandbox.onlinephpfunctions.com/code/e500b505860ded429e78869f61dbf4128ff368b3

Upvotes: 1

Views: 276

Answers (2)

anubhava
anubhava

Reputation: 785196

Converting my comment to answer so that solution is easy to find for future visitors.

You regex regex is almost correct but make sure to use a non-greedy quantifier with .{0,10} limit for surrounding substring:

$searchterm = 'blue';
$string = 'Here is a sentence talking about blue.  This sentence talks about red.';
$regex = '/.*?(.{0,10}\b' . $searchterm . '\b.{0,10}).*/si';
echo preg_replace($regex, '$1', $string);

Updated Code Demo

RegEx Demo

Upvotes: 6

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

You'd better use preg_match with .{0,10} quantifiers instead of {10},

function truncateString($searchterm){
    $string = 'Here is a sentence talking about blue.  This sentence talks about red.';
    $regex = '/.{0,10}\b' . $searchterm . '\b.{0,10}/si';
    if (preg_match($regex, $string, $m)) {
        echo $m[0] . "\n";
    }  
}

truncateString('blue');
// => ing about blue.  This se
truncateString('red');
// => lks about red.

See the PHP demo.

preg_match will find and return the first match only. The .{0,10} pattern will match zero to ten occurrences of any char (since the s modifier is used, the . matches even line break chars).

One more thing: if your $searchterm can contain special regex metacharacters, anywhere in the term, you should consider refactoring the code to

$regex = '/.{0,10}(?<!\w)' . preg_quote($searchterm, '/') . '(?!\w).{0,10}/si';

where (?<!\w) / (?!\w) are unambiguous word boundaries and the preg_quote is used to escape all special chars.

Upvotes: 2

Related Questions