jav974
jav974

Reputation: 1042

php regex unknown whitespace

I am having a hard time figuring out what is happening with this regexp to match multiple whitespace :

$str = '   ';

if (preg_match_all('/\s{2,}/', $str, $matches)) {
    var_dump($matches);
}

The fact is, if i replace str value with 3 "real" spaces, it works as expected, but obviously the characters in str are not whitespaces (copy paste from other source) !! But i need to match them to replace them with real spaces/whatever.

My question: What are those simple space looking characters in str and more important, how do i target them in a regexp ?

Upvotes: 0

Views: 346

Answers (2)

arty
arty

Reputation: 11

The whitespace characters captured by \s may include real space (code 0x20) horizontal tab character (0x09), carriage return (0x0D), line feed (0x0A) and form feed (0x0C). So if you want to turn all these characters to real spaces, you may use this line:

$str=preg_replace('/\s/',' ',$str);

Or, if you want to replace a sequence of two or more whitespace characters with just a single real space, use this instead:

$str=preg_replace('/\s{2,}/',' ',$str);

Upvotes: 0

user3942918
user3942918

Reputation: 26375

The middle character is a utf-8 encoded non-breaking space. Add the utf-8 modifier u to your regex and it'll work just fine, e.g. /\s{2,}/u.

Outputs:

array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(4) "   "
  }
}

Example

Upvotes: 2

Related Questions