jspit
jspit

Reputation: 7693

Remove all spaces from a string that are not enclosed in singlequotes or doublequotes

I have short strings like this

$str = 'abc | xx ??   "1 x \' 3" d e f \' y " 5 \' x yz';

I want to remove all spaces from a string that are not enclosed in single or double quotes. Any characters enclosed in single or double quotes should not be changed. As a result, I expect:

$expected =  'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';

My current solution based on character-wise comparisons is the following:

function removeSpaces($string){
  $ret = $stop = "";
  for($i=0; $i < strlen($string);$i++){
    $char = $string[$i];
    if($stop == "") {
      if($char == " ") continue;
      if($char == "'" OR $char == '"') $stop = $char;
    }
    else {
      if($char == $stop) $stop = "";
    }
    $ret .= $char;
  }
  return $ret;
}

Is there a solution that is smarter?

Upvotes: 1

Views: 353

Answers (3)

The fourth bird
The fourth bird

Reputation: 163277

You could capture either " or ' in a group and consume any escaped variants or each until encountering the closing matching ' or " using a backreference \1

(?<!\\)(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+

Regex demo | Php demo

Explanation

  • (?<!\\) Negative lookbehind, assert not a \ directly to the left
  • (['"]) capture group 1, match either ' or "
  • (?: Non capture group
    • (?!(?:\1|\\)). If what is not directly to the right is either the value in group 1 or a backslash, match any char except a newline
    • | Or
    • \\. Match an escaped character
  • )*+ Close non capture group and repeat 1+ times
  • \1 Backreference to what is captured in group 1 (match up either ' or ")
  • (*SKIP)(*FAIL) Skip the match until now. Read more about (*SKIP)(*FAIL)
  • | Or
  • \h+ Match 1+ horizontal whitespace chars that you want to remove

As @Wiktor Stribiżew points out in his comment

In some rare situations, this might match at a wrong position, namely, if there is a literal backslash (not an escaping one) before a single/double quoted string that should be skipped. You need to add (?:\{2})* after (?<!\)

The pattern would then be:

(?<!\\)(?:\\{2})*(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+

Regex demo

Upvotes: 1

Peter Thoeny
Peter Thoeny

Reputation: 7616

Here is a 3 step approach:

  1. replace spaces in quote sections with placeholder
  2. remove all spaces
  3. restore spaces in quote sections
    $str = 'abc | xx ??   "1 x \' 3" d e f \' y " 5 \' x yz';
    echo 'input:  ' . $str . "\n";
    $result = preg_replace_callback( // replace spaces in quote sections with placeholder
        '|(["\'])(.*?)(\1)|',
        function ($matches) {
            $s = preg_replace('/ /', "\x01", $matches[2]);
            return $matches[1] . $s . $matches[3];
        },
        $str
    );
    $result = preg_replace('/ /', '', $result);     // remove all spaces
    $result = preg_replace('/\x01/', ' ', $result); // restore spaces in quote sections
    echo 'result: ' . $result;
    echo "\nexpect: " . 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';

Output:

input:  abc | xx ??   "1 x ' 3" d e f ' y " 5 ' x yz
result: abc|xx??"1 x ' 3"def' y " 5 'xyz
expect: abc|xx??"1 x ' 3"def' y " 5 'xyz

Explanation:

  1. replace spaces in quote sections with placeholder
  • use a preg_replace_callback()
  • '|(["\'])(.*?)(\1)|' matches quote sections starting and ending with either " or '
  • the (\1) makes sure to match the closing quote (either " or ')
  • within the callback, use preg_replace() to replace all spaces with a non-printable replacement "\x01"
  1. remove all spaces
  • use preg_replace() to remove all spaces
  • the replace does not match the replacement "\x01", thus misses spaces in quote sections
  1. restore spaces in quote sections
  • use preg_replace() to restore all spaces from replacement "\x01"

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You can use

preg_replace('~(?<!\\\\)(?:\\\\{2})*(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(?!)|\s+~s', '', $str)

See the PHP demo and a regex demo.

Details

  • (?<!\\)(?:\\{2})* - a check if there is no escaping \ immediately on the left: any amount of double backslashes not preceded with \
  • (?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*') - either a double- or single-quoted string literal allowing escape sequences
  • (*SKIP)(?!) - skip the match and start a new search from the location where the regex failed
  • | - or
  • \s+ - 1 or more whitespaces.

Note that a backslash in a single-quoted PHP string literal is used to form string escape sequences, and thus a literal backslash is "coded" with the help of double backslashes, and to match a literal backslash in text, two such backslashes are required, hence "\\\\" is used.

Upvotes: 2

Related Questions