Reputation: 1636
I have this bit of regex used in a php preg_match to strip out trailing spaces from ":" and "("
([\(:])\s+
The problem I'm running into is that it ends up stripping out spaces I need that are within quotes. For example, this string:
img[style*="float: left"]
Is there a way to write the regex so it will match any ":" or "(" unless it is enclosed in double quotes?
Upvotes: 3
Views: 224
Reputation: 15010
This routine will:
Code
<?php
$string = 'img[style*="float: left"]
img: [style*="float: left"]
img( [style*="float: left"]
';
$regex = '/"[^"]*"|([:(])\s+/ims';
$output = preg_replace_callback(
$regex,
function ($matches) {
if (array_key_exists (1, $matches)) {
return $matches[1] ;
}
return $matches[0];
},
$string
);
echo "this is the output:" . $output;
Output
this is the output:img[style*="float: left"]
img:[style*="float: left"]
img([style*="float: left"]
Upvotes: 1
Reputation: 89639
You can try this:
$text = preg_replace('~(?|(\\\{2}|\\\"|"(?>[^"\\\]+|\\\{2}|\\\")*+")|([:(])\s+)~', '$1', $text);
The idea is to match double quotes parts before ([:(])\s+
and replace them by themselves.
To avoid to match escaped quotes, backslashes are matched before.
pattern details:
~ # pattern delimiter
(?| # branch reset : all capture groups inside have the same number
( # open a capturing group
\\\{2} # group of 2 backslashes (can't escape everything)
| # OR
\\\" # an escaped double quote
| # OR
"(?>[^"\\\]+|\\\{2}|\\\")*+" # content inside double quotes
) # close the capturing group
| # OR
( [:(] ) # a : or a ( in a capturing group
\s+ # spaces
) # close the branch reset group
~ # pattern delimiter
The interest is to deal with this kind of situations:
img: " : \" ( "
img: \" : ( " ( "
img: \\" : ( " ( "
result:
img:" : \" ( "
img:\" :(" ( "
img:\\" : ( " ("
Upvotes: 1
Reputation: 666
There are two ways to go about this:
You can use negative lookarounds (information here) to try and assert that there is not a double quote before or after something you don't want stripped. The problem I have with this is that there is no indication of how far away from the quotes :
or (
might be, and lookarounds cannot be of unknown length.
What I like to do, is to "preserve" anything enclosed within double quotes, with the regex \"[^"]+\"
within an array, and replacing them with a string (I use "THIS_IS_A_QUOTE"). After you have stored all your quotes in an array, strip all spaces, and finally restore all "THIS_IS_A_QUOTE" strings with the strings in the array.
Upvotes: 1