David
David

Reputation: 2783

Unexpected behavior of preg_replace with OSX

According to the following piece of code, I'm wondering how a simple preg_replace intended to remove multiple whitespaces can turn the character à into a question mark:

$str = 'nnn      à    nnn     é  nnn';
echo preg_replace('/\s+/', ' ', $str) . "\n";
// outputs 'nnn ? nnn é nnn'

This occurs on a Mac using OSX 10.8.4. Any idea?

Upvotes: 0

Views: 147

Answers (2)

punkeel
punkeel

Reputation: 979

Strange.

$ cat test.php
<?php
$str = '   à   n';
file_put_contents('a.bin',preg_replace('/\s+/', ' ', $str) . "\n");

file_put_contents('b.bin', 'à');

First, set up a test file containing à, named c.bin

$ php test.php 

Then we cat files to compare :

$ cat b.bin
à$ cat c.bin
à

Files b.bin and c.bin contains à as expected

$ hexdump -C b.bin 
00000000  c3 a0                                             |..|
00000002
$ hexdump -C c.bin 
00000000  c3 a0 0a                                          |...|
<00000003></00000003>

Thanks to hexdump we can assume that à is c3 a0

$ cat a.bin 
 ? n
$ hexdump -C a.bin 
00000000  20 c3 20 6e 0a                                    | . n.|
00000005

In the first file, a.bin, there is no a0 (NO-BREAK SPACE) and the accent is badly rendered

So it doesn't seem to be an encoding error

EDIT: You could use mb_ereg_replace or the u modifier (as said by HamZa) :

$ cat test.php 
<?php
$str = 'nnn      à    nnn     é  nnn';
var_dump(preg_replace('/\s+/u', ' ', $str));
var_dump(mb_ereg_replace('\s+', ' ', $str));
$ php test.php 
string(17) "nnn à nnn é nnn"
string(17) "nnn à nnn é nnn"

Upvotes: 2

Ghabriel Nunes
Ghabriel Nunes

Reputation: 372

You can change the encoding to UTF-8 in an HTML page with the following tag:

<meta http-equiv="Content-type" content="text/html; charset=utf-8">

Since it's probably a problem with your encoding, that tag will probably fix it.

Upvotes: 0

Related Questions