Reputation: 2783
According to the following piece of code, I'm wondering how a simple preg_replace
intended to remove multiple whitespaces can turn the character à
into a question mark:
$str = 'nnn à nnn é nnn';
echo preg_replace('/\s+/', ' ', $str) . "\n";
// outputs 'nnn ? nnn é nnn'
This occurs on a Mac using OSX 10.8.4. Any idea?
Upvotes: 0
Views: 147
Reputation: 979
Strange.
$ cat test.php
<?php
$str = ' à n';
file_put_contents('a.bin',preg_replace('/\s+/', ' ', $str) . "\n");
file_put_contents('b.bin', 'à');
First, set up a test file containing à
, named c.bin
$ php test.php
Then we cat files to compare :
$ cat b.bin
à$ cat c.bin
à
Files b.bin
and c.bin
contains à
as expected
$ hexdump -C b.bin
00000000 c3 a0 |..|
00000002
$ hexdump -C c.bin
00000000 c3 a0 0a |...|
<00000003></00000003>
Thanks to hexdump we can assume that à
is c3 a0
$ cat a.bin
? n
$ hexdump -C a.bin
00000000 20 c3 20 6e 0a | . n.|
00000005
In the first file, a.bin
, there is no a0
(NO-BREAK SPACE) and the accent is badly rendered
So it doesn't seem to be an encoding error
EDIT:
You could use mb_ereg_replace or the u
modifier (as said by HamZa) :
$ cat test.php
<?php
$str = 'nnn à nnn é nnn';
var_dump(preg_replace('/\s+/u', ' ', $str));
var_dump(mb_ereg_replace('\s+', ' ', $str));
$ php test.php
string(17) "nnn à nnn é nnn"
string(17) "nnn à nnn é nnn"
Upvotes: 2
Reputation: 372
You can change the encoding to UTF-8 in an HTML page with the following tag:
<meta http-equiv="Content-type" content="text/html; charset=utf-8">
Since it's probably a problem with your encoding, that tag will probably fix it.
Upvotes: 0