Reputation: 2128
I have to convert a url like "você-é-um-ás-da-aviação" to "voce-e-um-as-da-aviacao", to make it reading friendly on the SERP.
I could a common replacement , but I don't really like having to list each and every character, because I find it clunky and I want to keep language specific characters out of the source code as much as i can.
Is it possible? is it viable?
Upvotes: 3
Views: 377
Reputation: 8459
You could use a combination of iconv to get your string as ASCII then some preg_replace to remove the unwanted characters.
Something like:
$string = "você-é-um-ás-da-aviação";
$collated = iconv('UTF-8', 'ASCII//TRANSLIT', $string);
$filtred = preg_replace('`[^-a-zA-Z0-9]`', '', $collated);
echo $filtred;
Upvotes: 0
Reputation: 58
function url_safe($string){
$url = $string;
setlocale(LC_ALL, 'fr_FR'); // change to the one of your language
$url = iconv("UTF-8", "ASCII//TRANSLIT", $url);
$url = preg_replace('~[^\\pL0-9_]+~u', '-', $url);
$url = trim($url, "-");
$url = strtolower($url);
return $url;
}
Upvotes: 3
Reputation: 98559
You could use the canonical decomposition mapping provided by the Unicode foundation (the files in http://www.unicode.org/Public/UNIDATA/ ).
However, this is not as simple as you seem to think it is - believe it or not, there is a "kcal" symbol whose canonical decomposition is four characters long.
You may also wish to consult the numeric equivalents tables there, as a "circled number seven" should probably map to the ASCII numeral seven, and so forth.
I strongly advise against this strategy, however - you're butchering your text for little gain, and can't recover the original input once you've transformed it.
Upvotes: 2
Reputation: 41533
I suggest you map every special character and it's replacement into an array and then replace the text with a regex.
I know that you stated that you do not want to use a common replacement, but it's the only viable way to do so. You could filter them out(by checking if their ascii code is situated in a certain range) but it's not the same for the correct replacement.
Upvotes: 0