Reputation: 1725
I'm using php to clean out names to be used in a url slug:, where $title might look like this: "This is the Title" or "This is the Title & Subtitle"
Those above examples I want to change to "this-is-the-title" and "this-is-the-title-subtitle", respectively. So, I made this code....
<?php
$input1 = str_replace(" ","-",strtolower($title));
$output1 = preg_replace('/[^A-Za-z0-9-]/', '', $input1);
$output2 = str_replace("--","-",$output1);
echo $output2;
?>
It's working great, cleaning out all the non-alpha numeric, replacing spaces with dashes and making everything lower case.
However, in some instances, it's returning the double dash ("Title & More" turns to ("title--more"). It should be "title-more". I know why the double dash is there, but I can't seem to clean it out.
I put in the line of code for $output2, but it doesn't seem to be working for some reason. After lots of trial and error, I'm at a loss.
Thanks...
Upvotes: 3
Views: 3991
Reputation: 2032
I think this is best:
$output = trim(preg_replace('/-+/', '-', $str), '-');
Upvotes: 0
Reputation: 1725
After sleeping on the issue, then reading these replies (thanks as always) it finally dawned on me what was happening.
My code was leaving me with three dashes: This: "Books & Magazines" Was changing to this: "Books---Magazines" (The & replace with dash and spaces as well gives 3 dashes)
I ran it through an str_replace to clear double dashes to single, but still was left with a double dash and this was what was driving me crazy.
I kept getting this: "Books--Magazines"
Turns out the str_replace was actually working. Since there were three dashes, the first double dash was replaced and now the THIRD dash, but only a SINGLE dash was left alone.
Thus, this: "---" became this "--"
I needed to run it through a str_replace 1 more time to fix the problem. The solve problem looks just like my original code, but with 1 more line.
Probably not the most elegant solution, but it works and makes sense in my head finally.
<?php
$input1 = str_replace(" ","-",strtolower($title));
$output1 = preg_replace('/[^A-Za-z0-9-]/', '', $input1);
$output2 = str_replace("--","-",$output1);
$output3 = str_replace("--","-",$output2);
echo $output3;
?>
Upvotes: 0
Reputation: 75714
You can solve the same in a single regex:
preg_replace('/[^a-z0-9]+/', '-', strtolower($title));
The only change I made was the trailing +
in the regex, meaning "1 or more occurances of the previous group". Now every group of special characters is replaced with a single dash - no matter how long the group is.
Just for answering the actual question, though: You would need to reduce duplicate dashes in a loop in your case:
$output2 = $output1;
do {
$output1 = $output2;
$output2 = str_replace("--", "-", $output1);
} while ($output2 != $output1);
(I would seriously consider renaming the variables, though)
Upvotes: 6
Reputation: 1774
Use this to remove multiple spaces - preg_replace('/\s+/', ' ', $title)
After this add the dashes to the string - str_replace(" ","-",strtolower($title));
Upvotes: 1
Reputation: 17762
I can share my little function. Works even all kinds of languages. Russian, german, etc.
public static function getSeo($str, $separator = '-'){
$from = array('А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ё', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р',
'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ё',
'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы',
'ь', 'э', 'ю', 'я', "Á", "À", "Â", "Ä", "Ă", "Ā", "Ã", "Å", "Ą", "Æ", "Ć", "Ċ", "Ĉ", "Č", "Ç", "Ď", "Đ", "Ð",
"É", "È", "Ė", "Ê", "Ë", "Ě", "Ē", "Ę", "Ə", "Ġ", "Ĝ", "Ğ", "Ģ", "á", "à", "â", "ä", "ă", "ā", "ã", "å", "ą",
"æ", "ć", "ċ", "ĉ", "č", "ç", "ď", "đ", "ð", "é", "è", "ė", "ê", "ë", "ě", "ē", "ę", "ə", "ġ", "ĝ", "ğ", "ģ",
"Ĥ", "Ħ", "I", "Í", "Ì", "İ", "Î", "Ï", "Ī", "Į", "IJ", "Ĵ", "Ķ", "Ļ", "Ł", "Ń", "Ň", "Ñ", "Ņ", "Ó", "Ò", "Ô",
"Ö", "Õ", "Ő", "Ø", "Ơ", "Œ", "ĥ", "ħ", "ı", "í", "ì", "i", "î", "ï", "ī", "į", "ij", "ĵ", "ķ", "ļ", "ł", "ń",
"ň", "ñ", "ņ", "ó", "ò", "ô", "ö", "õ", "ő", "ø", "ơ", "œ", "Ŕ", "Ř", "Ś", "Ŝ", "Š", "Ş", "Ť", "Ţ", "Þ", "Ú",
"Ù", "Û", "Ü", "Ŭ", "Ū", "Ů", "Ų", "Ű", "Ư", "Ŵ", "Ý", "Ŷ", "Ÿ", "Ź", "Ż", "Ž", "ŕ", "ř", "ś", "ŝ", "š", "ş",
"ß", "ť", "ţ", "þ", "ú", "ù", "û", "ü", "ŭ", "ū", "ů", "ų", "ű", "ư", "ŵ", "ý", "ŷ", "ÿ", "ź", "ż", "ž"
);
$to = array('A', 'B', 'V', 'G', 'D', 'E', 'E', 'Z', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S',
'T', 'U', 'F', 'H', 'C', 'Tch', 'Sh', 'Shtch', '', 'Y', '', 'E', 'Iu', 'Ja', 'a', 'b', 'v', 'g', 'd', 'e',
'e', 'z', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'tch', 'sh',
'shtch', '', 'y', '', 'e', 'iu', 'ja', "A", "A", "A", "A", "A", "A", "A", "A", "A", "AE", "C", "C", "C", "C",
"C", "D", "D", "D", "E", "E", "E", "E", "E", "E", "E", "E", "G", "G", "G", "G", "G", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "ae", "c", "c", "c", "c", "c", "d", "d", "d", "e", "e", "e", "e", "e", "e", "e", "e", "g",
"g", "g", "g", "g", "H", "H", "I", "I", "I", "I", "I", "I", "I", "I", "IJ", "J", "K", "L", "L", "N", "N", "N",
"N", "O", "O", "O", "O", "O", "O", "O", "O", "CE", "h", "h", "i", "i", "i", "i", "i", "i", "i", "i", "ij", "j",
"k", "l", "l", "n", "n", "n", "n", "o", "o", "o", "o", "o", "o", "o", "o", "o", "R", "R", "S", "S", "S", "S",
"T", "T", "T", "U", "U", "U", "U", "U", "U", "U", "U", "U", "U", "W", "Y", "Y", "Y", "Z", "Z", "Z", "r", "r",
"s", "s", "s", "s", "B", "t", "t", "b", "u", "u", "u", "u", "u", "u", "u", "u", "u", "u", "w", "y", "y", "y",
"z", "z", "z"
);
$str = str_replace($from, $to, $str);
$str = iconv('UTF-8', 'ASCII//IGNORE//TRANSLIT', $str);
$str = trim(preg_replace('/[^ A-Za-z0-9_-]/', ' ', $str));
return preg_replace('/[ -]+/', $separator, $str);
}
Upvotes: 3
Reputation: 3163
you can replace <space>&<space>
with a single -
. or replace more than one instance of -
with a single -
Upvotes: 0
Reputation: 17010
I'm using a function written by myself to reach exactly the same goal.
function urlify($string, $utf8Input = false) {
$string = strtolower(iconv($utf8Input ? 'UTF-8' : 'ISO-8859-1', 'ASCII//TRANSLIT', $string));
$string = preg_replace('/[^a-z0-9]+/', '-', $string);
$string = trim($string, '-');
if (empty($string)) return '-';
return $string;
}
You can remove the UTF8 and iconv part if you are not working with languages that use accented letters.
Upvotes: 3