Reputation: 23
I would like to know if a single set of regex search/replace patterns could be used to replace all occurrences of a specific character inside a string contained within 2 tokens.
For example, is it possible to replace all periods with spaces for the text between TOKEN1 & TOKEN2 as in the example below?
So that:
TOKEN1:Run.Spot.run:TOKEN2
is changed to:
TOKEN1:Run Spot run:TOKEN2
NOTE: The regular expression would need to be capable of replacing any number of periods within any text, and not just the specific pattern above.
I ask this question more for my personal knowledge, as it is something I have wanted to do quite a few times in the past with various regex implementations. In this particular case, however, the regex would be in php.
I am not interested in php workarounds as I know how to do that. I am trying to expand my knowledge of regex.
Thanks
Upvotes: 2
Views: 2101
Reputation: 89557
A way to do this:
$pattern = '~(?:TOKEN1:|\G(?!^))(?:[^:.]+|:(?!TOKEN2))*\K\.~';
$replacement = ' ';
$subject = 'TOKEN1:Run.Spot.run:TOKEN2';
$result = preg_replace($pattern, $replacement, $subject);
pattern details:
~ # pattern delimiter
(?: # open a non capturing group
TOKEN1: # TOKEN1:
| # OR
\G(?!^) # a contiguous match but not at the start of the string
) # close the non capturing group
(?: # open a non capturing group
[^:.]+ # all that is not the first character of :TOKEN2 or the searched character
| # OR
:(?!TOKEN2) # The first character of :TOKEN2 not followed by the other characters
)* # repeat the non capturing group zero or more times
\K # reset the match
\. # the searched character
~ # delimiter
The idea is to use \G
to force each match to be TOKEN1:
or a match contiguous with the precedent match.
Notice: the default behavior is like an html tag (it is always open until it is closed). If :TOKEN2
is not found all the \.
characters will be replaced after TOKEN1:
.
Upvotes: 4
Reputation: 3991
At it's simplest, you would need an escaped (\
) period (since period usually matches any character) as your pattern :\.
, and you would replace it with a space: .
This will replace all instances of .
with .
However, from your comment, you appear to be asking for a regex to replace all periods between word characters:
(?<=\w)\.(?=\w)
You would need a positive (zero-width noncapturing) lookbehind for a word character: (?<=\w)
, your escaped period (\.
) and a positive (zero-width noncapturing) lookahead for a word character: (?=\w)
. Replacing this with a space would have the result you want.
If you want to replace periods only between tokens, you could prepend a positive lookbehind: (?<=TOKEN1:.+)
and append a positive lookahead: (?=.+TOKEN2), so the complete regex would be:
(?<=TOKEN1:.+)(?<=\w)\.(?=\w)(?=.+TOKEN2)
You may need to refine this if a period can occur immediately after the opening token and/or immediately before the closing token and you don't want to replace them.
Upvotes: 0
Reputation: 183321
I think the best way is to write something like this:
$result =
preg_replace_callback(
'/(TOKEN1:)([^:]+)(:TOKEN2)/g',
function ($matches) {
return $matches[0]
. preg_replace('/[.]/g', ' ', $matches[1])
. $matches[2];
},
'TOKEN1:Run.Spot.run:TOKEN2'
);
(Disclaimer: not tested.)
Upvotes: 0