Reputation: 466
I've this simple code:
function getCleanText($rawText) //removes doublespace and punctuation
{
return strtolower(preg_replace("/[\s\t]+/u", " ",
preg_replace("/[^a-zA-Z1-9àèéìòù]+/u", " ", $rawText)));
}
echo getCleanText("uscì"). " uscì <br>";
the function just removes punctuation and double spaces. Why i've this output?
usc�� uscì
I mean "uscì" doesn't have any punctuation and the function is supposed to return it as it is without modification. Still i've problem with all accented letters. The web page is encoded in UTF-8. if i try with utf_encode like this
return utf8_encode(strtolower(preg_replace("/[\s\t]+/u", " ",
preg_replace("/[^a-zA-Z1-9àèéìòù]+/u", " ", $rawText))));
the output is
usc㬠uscì
any ideas? Where i can find some documentation to understand my error?
Upvotes: 2
Views: 535
Reputation: 1364
Using mb_strtolower
, rather than just strtolower
resolves the problem in my tests. I assume it's a php.ini
configuration issue that's means it works OK for some people and not others.
Upvotes: 1