JoulinRouge
JoulinRouge

Reputation: 466

php preg_replace wrong charset or encoding

I've this simple code:

function getCleanText($rawText) //removes doublespace and punctuation
{
    return strtolower(preg_replace("/[\s\t]+/u", " ", 
        preg_replace("/[^a-zA-Z1-9àèéìòù]+/u", " ", $rawText)));
}

echo getCleanText("uscì"). " uscì <br>";

the function just removes punctuation and double spaces. Why i've this output?

usc�� uscì 

I mean "uscì" doesn't have any punctuation and the function is supposed to return it as it is without modification. Still i've problem with all accented letters. The web page is encoded in UTF-8. if i try with utf_encode like this

return utf8_encode(strtolower(preg_replace("/[\s\t]+/u", " ", 
        preg_replace("/[^a-zA-Z1-9àèéìòù]+/u", " ", $rawText))));

the output is

usc㬠uscì 

any ideas? Where i can find some documentation to understand my error?

Upvotes: 2

Views: 535

Answers (1)

StuBez
StuBez

Reputation: 1364

Using mb_strtolower, rather than just strtolower resolves the problem in my tests. I assume it's a php.ini configuration issue that's means it works OK for some people and not others.

Upvotes: 1

Related Questions