gen_Eric
gen_Eric

Reputation: 227270

Backreferences using OR

I was trying to match the first and last letters in a few words that may be found in a string. I was using a regex with |.

Let's take the following string:

The quick brown fox jumps over the lazy dog

I want to match either fox or dog, so I made the following regex:

/fox|dog/

Using PHP's preg_replace, that regex works correctly:

$str = 'The quick brown fox jumps over the lazy dog';
echo preg_replace('/fox|dog/', '=>$0<=', $str);

This echos:

The quick brown =>fox<= jumps over the lazy =>dog<=

That's not quite the result I want. So, starting with that regex, I tried to modify it so that the result would look like this:

The quick brown =>f...x<= jumps over the lazy =>d...g<=

I tried with this code:

$str = 'The quick brown fox jumps over the lazy dog';
echo preg_replace('/(f)o(x)|(d)o(g)/', '=>$1...$2<=', $str);

This did not produce what I wanted. This echoed:

The quick brown =>f...x<= jumps over the lazy =>...<=

After some debugging, I figured out why. I figured that since I was using | it would match each word separately, but it does not. The f is backreference 1 as I thought, but the d is not. The d is actually backreference 3! This is because the groups are for the entire regex, not just which side of the | is matched. But, backreference 0 is always the word matched (fox or dog), so I'm a little confused.

How can I use back references to match the 1st and last letters of multiple words?

I found a solution using preg_replace_callback, but I was wondering if I could get this same result using backreferences.

$str = 'The quick brown fox jumps over the lazy dog';
echo preg_replace_callback('/fox|dog/', function($matches){
    $a = $matches[0];
    return '=>'.$a[0].'...'.$a[strlen($a)-1].'<=';
}, $str);

Upvotes: 2

Views: 104

Answers (2)

nhahtdh
nhahtdh

Reputation: 56809

Actually callback function method is superior to all this, since can work in all cases. You can even show more or less character depending on the length of the match.


Below is my initial answer, which is inferior to what I described above

In general, you can do it like this:

/(?=(.))(?:pattern)(?<=(.))/s

Fill in pattern with your pattern. I used s flag to make . truly matches any character without exception. The pattern needs not be inside non-capturing group if the original pattern doesn't have | on the highest level.

You still need to check the length of the text captured by pattern before doing the replacement, though. (Especially the case the length is 1, and possibly 2 also). This is easily achievable by using replace callback function.

However, note that the method above may not work well with pattern whose minimum length is 0.

Upvotes: 2

Bergi
Bergi

Reputation: 664620

I could utilize a non-matching lookahead expression for this:

/(?=fox|dog)(f|d)o(x|g)/

(didn't test in PHP, but works in JS)

It tests first whether the following is one of the searched words, and then matches first and last letter in only one capturing group. However, this method will get far more complicated if the words are not that similar (here: same length, same middle letter[s]).

Upvotes: 2

Related Questions