Reputation: 7693
I have short strings like this
$str = 'abc | xx ?? "1 x \' 3" d e f \' y " 5 \' x yz';
I want to remove all spaces from a string that are not enclosed in single or double quotes. Any characters enclosed in single or double quotes should not be changed. As a result, I expect:
$expected = 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';
My current solution based on character-wise comparisons is the following:
function removeSpaces($string){
$ret = $stop = "";
for($i=0; $i < strlen($string);$i++){
$char = $string[$i];
if($stop == "") {
if($char == " ") continue;
if($char == "'" OR $char == '"') $stop = $char;
}
else {
if($char == $stop) $stop = "";
}
$ret .= $char;
}
return $ret;
}
Is there a solution that is smarter?
Upvotes: 1
Views: 353
Reputation: 163277
You could capture either "
or '
in a group and consume any escaped variants or each until encountering the closing matching '
or "
using a backreference \1
(?<!\\)(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+
Explanation
(?<!\\)
Negative lookbehind, assert not a \
directly to the left(['"])
capture group 1, match either '
or "
(?:
Non capture group
(?!(?:\1|\\)).
If what is not directly to the right is either the value in group 1 or a backslash, match any char except a newline|
Or\\.
Match an escaped character)*+
Close non capture group and repeat 1+ times\1
Backreference to what is captured in group 1 (match up either '
or "
)(*SKIP)(*FAIL)
Skip the match until now. Read more about (*SKIP)(*FAIL)|
Or\h+
Match 1+ horizontal whitespace chars that you want to removeAs @Wiktor Stribiżew points out in his comment
In some rare situations, this might match at a wrong position, namely, if there is a literal backslash (not an escaping one) before a single/double quoted string that should be skipped. You need to add (?:\{2})* after (?<!\)
The pattern would then be:
(?<!\\)(?:\\{2})*(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+
Upvotes: 1
Reputation: 7616
Here is a 3 step approach:
$str = 'abc | xx ?? "1 x \' 3" d e f \' y " 5 \' x yz';
echo 'input: ' . $str . "\n";
$result = preg_replace_callback( // replace spaces in quote sections with placeholder
'|(["\'])(.*?)(\1)|',
function ($matches) {
$s = preg_replace('/ /', "\x01", $matches[2]);
return $matches[1] . $s . $matches[3];
},
$str
);
$result = preg_replace('/ /', '', $result); // remove all spaces
$result = preg_replace('/\x01/', ' ', $result); // restore spaces in quote sections
echo 'result: ' . $result;
echo "\nexpect: " . 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';
Output:
input: abc | xx ?? "1 x ' 3" d e f ' y " 5 ' x yz
result: abc|xx??"1 x ' 3"def' y " 5 'xyz
expect: abc|xx??"1 x ' 3"def' y " 5 'xyz
Explanation:
preg_replace_callback()
'|(["\'])(.*?)(\1)|'
matches quote sections starting and ending with either "
or '
(\1)
makes sure to match the closing quote (either "
or '
)preg_replace()
to replace all spaces with a non-printable replacement "\x01"
preg_replace()
to remove all spaces"\x01"
, thus misses spaces in quote sectionspreg_replace()
to restore all spaces from replacement "\x01"
Upvotes: 0
Reputation: 626748
You can use
preg_replace('~(?<!\\\\)(?:\\\\{2})*(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(?!)|\s+~s', '', $str)
See the PHP demo and a regex demo.
Details
(?<!\\)(?:\\{2})*
- a check if there is no escaping \
immediately on the left: any amount of double backslashes not preceded with \
(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')
- either a double- or single-quoted string literal allowing escape sequences(*SKIP)(?!)
- skip the match and start a new search from the location where the regex failed|
- or\s+
- 1 or more whitespaces.Note that a backslash in a single-quoted PHP string literal is used to form string escape sequences, and thus a literal backslash is "coded" with the help of double backslashes, and to match a literal backslash in text, two such backslashes are required, hence "\\\\"
is used.
Upvotes: 2