Joyce Babu
Joyce Babu

Reputation: 20654

Reset position pointer to replace all occurrences of a pattern until delimiter in single step using preg_replace

Is it possible to use preg_replace to replace all occurrences of a pattern until a specified delimiter?

I want to replace multiple occurrences of a pattern, not the whole string before the delimiter.

Is it possible to do this in a single step, without splitting the string? Is it possible to specify that the position pointer should be reset to the beginning after every replacement? Can I use lookahead to achieve this?

For example, I want to replace all occurrences of // in the following urls, until the ? character.

Input:

https://www.example.com//abc/def/ghi/?jkl=mno//pqr
https://www.example.com//abc/def//ghi/?jkl=mno//pqr
https://www.example.com//abc//def//ghi/?jkl=mno//pqr

Expected Output:

https://www.example.com/abc/def/ghi/?jkl=mno//pqr

Please note

Upvotes: 0

Views: 177

Answers (2)

revo
revo

Reputation: 48711

Current accepted answer is good enough to be a solution but suffers from some issues that may cause problems in near future:

  • It really doesn't stop matching right after reaching the first occurrence of ?

  • It only works on https protocol (you need to add others manually to lookbehind).

Regex:

(^\w+:/|\G[^?/]*)/+

Above regex invokes \G which matches a position right where the previous match ends. It means when a ? is found it can't continue matching.

See live demo here

PHP:

echo preg_replace('@(^\w+:/|\G[^?/]*)/+@', '$1/', $url);

Please note that you may need (?!^) before \G if there is a chance that first side of alternation couldn't satisfy e.g. in ://example.com

Upvotes: 3

Nick
Nick

Reputation: 147146

You can use a positive lookahead to ensure that the // is followed by a ?:

$urls = array('https://www.example.com//abc/def/ghi/?jkl=mno//pqr',
'https://www.example.com//abc/def//ghi/?jkl=mno//pqr',
'https://www.example.com//abc//def//ghi/?jkl=mno//pqr');
foreach ($urls as $url)
    echo preg_replace('#//(?=.*\?)#', '/', $url) . "\n";

Output:

https:/www.example.com/abc/def/ghi/?jkl=mno//pqr
https:/www.example.com/abc/def/ghi/?jkl=mno//pqr
https:/www.example.com/abc/def/ghi/?jkl=mno//pqr

Edit

As @revo points out, this is also removing the // after https:. To avoid that, add a negative lookbehind:

foreach ($urls as $url)
    echo preg_replace('#(?<!https:)//(?=.*\?)#', '$1/', $url) . "\n";

Output:

https://www.example.com/abc/def/ghi/?jkl=mno//pqr
https://www.example.com/abc/def/ghi/?jkl=mno//pqr
https://www.example.com/abc/def/ghi/?jkl=mno//pqr

Upvotes: 3

Related Questions