Reputation: 20654
Is it possible to use preg_replace
to replace all occurrences of a pattern until a specified delimiter?
I want to replace multiple occurrences of a pattern, not the whole string before the delimiter.
Is it possible to do this in a single step, without splitting the string? Is it possible to specify that the position pointer should be reset to the beginning after every replacement? Can I use lookahead to achieve this?
For example, I want to replace all occurrences of //
in the following urls, until the ?
character.
Input:
https://www.example.com//abc/def/ghi/?jkl=mno//pqr
https://www.example.com//abc/def//ghi/?jkl=mno//pqr
https://www.example.com//abc//def//ghi/?jkl=mno//pqr
Expected Output:
https://www.example.com/abc/def/ghi/?jkl=mno//pqr
Please note
//
in the subject string //
after the delimiter ?
is left untouched.Upvotes: 0
Views: 177
Reputation: 48711
Current accepted answer is good enough to be a solution but suffers from some issues that may cause problems in near future:
It really doesn't stop matching right after reaching the first occurrence of ?
It only works on https
protocol (you need to add others manually to lookbehind).
Regex:
(^\w+:/|\G[^?/]*)/+
Above regex invokes \G
which matches a position right where the previous match ends. It means when a ?
is found it can't continue matching.
See live demo here
PHP:
echo preg_replace('@(^\w+:/|\G[^?/]*)/+@', '$1/', $url);
Please note that you may need (?!^)
before \G
if there is a chance that first side of alternation couldn't satisfy e.g. in ://example.com
Upvotes: 3
Reputation: 147146
You can use a positive lookahead to ensure that the //
is followed by a ?
:
$urls = array('https://www.example.com//abc/def/ghi/?jkl=mno//pqr',
'https://www.example.com//abc/def//ghi/?jkl=mno//pqr',
'https://www.example.com//abc//def//ghi/?jkl=mno//pqr');
foreach ($urls as $url)
echo preg_replace('#//(?=.*\?)#', '/', $url) . "\n";
Output:
https:/www.example.com/abc/def/ghi/?jkl=mno//pqr
https:/www.example.com/abc/def/ghi/?jkl=mno//pqr
https:/www.example.com/abc/def/ghi/?jkl=mno//pqr
Edit
As @revo points out, this is also removing the //
after https:
. To avoid that, add a negative lookbehind:
foreach ($urls as $url)
echo preg_replace('#(?<!https:)//(?=.*\?)#', '$1/', $url) . "\n";
Output:
https://www.example.com/abc/def/ghi/?jkl=mno//pqr
https://www.example.com/abc/def/ghi/?jkl=mno//pqr
https://www.example.com/abc/def/ghi/?jkl=mno//pqr
Upvotes: 3