Kevin
Kevin

Reputation: 1725

Php clean double dash with str_replace

I'm using php to clean out names to be used in a url slug:, where $title might look like this: "This is the Title" or "This is the Title & Subtitle"

Those above examples I want to change to "this-is-the-title" and "this-is-the-title-subtitle", respectively. So, I made this code....

<?php 
$input1 = str_replace(" ","-",strtolower($title)); 
$output1 = preg_replace('/[^A-Za-z0-9-]/', '', $input1); 
$output2 = str_replace("--","-",$output1); 
echo $output2; 
?>

It's working great, cleaning out all the non-alpha numeric, replacing spaces with dashes and making everything lower case.

However, in some instances, it's returning the double dash ("Title & More" turns to ("title--more"). It should be "title-more". I know why the double dash is there, but I can't seem to clean it out.

I put in the line of code for $output2, but it doesn't seem to be working for some reason. After lots of trial and error, I'm at a loss.

Thanks...

Upvotes: 3

Views: 3991

Answers (7)

Ho&#224;ng Vũ Tgtt
Ho&#224;ng Vũ Tgtt

Reputation: 2032

I think this is best:

 $output =  trim(preg_replace('/-+/', '-', $str), '-');

More: How can I convert two or more dashes to singles and remove all dashes at the beginning and end of a string?

Upvotes: 0

Kevin
Kevin

Reputation: 1725

After sleeping on the issue, then reading these replies (thanks as always) it finally dawned on me what was happening.

My code was leaving me with three dashes: This: "Books & Magazines" Was changing to this: "Books---Magazines" (The & replace with dash and spaces as well gives 3 dashes)

I ran it through an str_replace to clear double dashes to single, but still was left with a double dash and this was what was driving me crazy.

I kept getting this: "Books--Magazines"

Turns out the str_replace was actually working. Since there were three dashes, the first double dash was replaced and now the THIRD dash, but only a SINGLE dash was left alone.

Thus, this: "---" became this "--"

I needed to run it through a str_replace 1 more time to fix the problem. The solve problem looks just like my original code, but with 1 more line.

Probably not the most elegant solution, but it works and makes sense in my head finally.

<?php 
$input1 = str_replace(" ","-",strtolower($title)); 
$output1 = preg_replace('/[^A-Za-z0-9-]/', '', $input1); 
$output2 = str_replace("--","-",$output1); 
$output3 = str_replace("--","-",$output2); 
echo $output3; 
?>

Upvotes: 0

soulmerge
soulmerge

Reputation: 75714

You can solve the same in a single regex:

preg_replace('/[^a-z0-9]+/', '-', strtolower($title));

The only change I made was the trailing + in the regex, meaning "1 or more occurances of the previous group". Now every group of special characters is replaced with a single dash - no matter how long the group is.

Just for answering the actual question, though: You would need to reduce duplicate dashes in a loop in your case:

$output2 = $output1;
do {
    $output1 = $output2;
    $output2 = str_replace("--", "-", $output1);
} while ($output2 != $output1);

(I would seriously consider renaming the variables, though)

Upvotes: 6

Jigar Tank
Jigar Tank

Reputation: 1774

Use this to remove multiple spaces - preg_replace('/\s+/', ' ', $title)
After this add the dashes to the string - str_replace(" ","-",strtolower($title));

Upvotes: 1

Mārtiņš Briedis
Mārtiņš Briedis

Reputation: 17762

I can share my little function. Works even all kinds of languages. Russian, german, etc.

public static function getSeo($str, $separator = '-'){
    $from = array('А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ё', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р',
        'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ё',
        'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы',
        'ь', 'э', 'ю', 'я', "Á", "À", "Â", "Ä", "Ă", "Ā", "Ã", "Å", "Ą", "Æ", "Ć", "Ċ", "Ĉ", "Č", "Ç", "Ď", "Đ", "Ð",
        "É", "È", "Ė", "Ê", "Ë", "Ě", "Ē", "Ę", "Ə", "Ġ", "Ĝ", "Ğ", "Ģ", "á", "à", "â", "ä", "ă", "ā", "ã", "å", "ą",
        "æ", "ć", "ċ", "ĉ", "č", "ç", "ď", "đ", "ð", "é", "è", "ė", "ê", "ë", "ě", "ē", "ę", "ə", "ġ", "ĝ", "ğ", "ģ",
        "Ĥ", "Ħ", "I", "Í", "Ì", "İ", "Î", "Ï", "Ī", "Į", "IJ", "Ĵ", "Ķ", "Ļ", "Ł", "Ń", "Ň", "Ñ", "Ņ", "Ó", "Ò", "Ô",
        "Ö", "Õ", "Ő", "Ø", "Ơ", "Œ", "ĥ", "ħ", "ı", "í", "ì", "i", "î", "ï", "ī", "į", "ij", "ĵ", "ķ", "ļ", "ł", "ń",
        "ň", "ñ", "ņ", "ó", "ò", "ô", "ö", "õ", "ő", "ø", "ơ", "œ", "Ŕ", "Ř", "Ś", "Ŝ", "Š", "Ş", "Ť", "Ţ", "Þ", "Ú",
        "Ù", "Û", "Ü", "Ŭ", "Ū", "Ů", "Ų", "Ű", "Ư", "Ŵ", "Ý", "Ŷ", "Ÿ", "Ź", "Ż", "Ž", "ŕ", "ř", "ś", "ŝ", "š", "ş",
        "ß", "ť", "ţ", "þ", "ú", "ù", "û", "ü", "ŭ", "ū", "ů", "ų", "ű", "ư", "ŵ", "ý", "ŷ", "ÿ", "ź", "ż", "ž"
    );
    $to = array('A', 'B', 'V', 'G', 'D', 'E', 'E', 'Z', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S',
        'T', 'U', 'F', 'H', 'C', 'Tch', 'Sh', 'Shtch', '', 'Y', '', 'E', 'Iu', 'Ja', 'a', 'b', 'v', 'g', 'd', 'e',
        'e', 'z', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'tch', 'sh',
        'shtch', '', 'y', '', 'e', 'iu', 'ja', "A", "A", "A", "A", "A", "A", "A", "A", "A", "AE", "C", "C", "C", "C",
        "C", "D", "D", "D", "E", "E", "E", "E", "E", "E", "E", "E", "G", "G", "G", "G", "G", "a", "a", "a", "a", "a",
        "a", "a", "a", "a", "ae", "c", "c", "c", "c", "c", "d", "d", "d", "e", "e", "e", "e", "e", "e", "e", "e", "g",
        "g", "g", "g", "g", "H", "H", "I", "I", "I", "I", "I", "I", "I", "I", "IJ", "J", "K", "L", "L", "N", "N", "N",
        "N", "O", "O", "O", "O", "O", "O", "O", "O", "CE", "h", "h", "i", "i", "i", "i", "i", "i", "i", "i", "ij", "j",
        "k", "l", "l", "n", "n", "n", "n", "o", "o", "o", "o", "o", "o", "o", "o", "o", "R", "R", "S", "S", "S", "S",
        "T", "T", "T", "U", "U", "U", "U", "U", "U", "U", "U", "U", "U", "W", "Y", "Y", "Y", "Z", "Z", "Z", "r", "r",
        "s", "s", "s", "s", "B", "t", "t", "b", "u", "u", "u", "u", "u", "u", "u", "u", "u", "u", "w", "y", "y", "y",
        "z", "z", "z"
    );
    $str = str_replace($from, $to, $str);
    $str = iconv('UTF-8', 'ASCII//IGNORE//TRANSLIT', $str);
    $str = trim(preg_replace('/[^ A-Za-z0-9_-]/', ' ', $str));
    return preg_replace('/[ -]+/', $separator, $str);
}

Upvotes: 3

Ravi Bhatt
Ravi Bhatt

Reputation: 3163

you can replace <space>&<space> with a single -. or replace more than one instance of - with a single -

Upvotes: 0

lorenzo-s
lorenzo-s

Reputation: 17010

I'm using a function written by myself to reach exactly the same goal.

function urlify($string, $utf8Input = false) {
    $string = strtolower(iconv($utf8Input ? 'UTF-8' : 'ISO-8859-1', 'ASCII//TRANSLIT', $string));
    $string = preg_replace('/[^a-z0-9]+/', '-', $string);
    $string = trim($string, '-');
    if (empty($string)) return '-';
    return $string;
}

You can remove the UTF8 and iconv part if you are not working with languages that use accented letters.

Upvotes: 3

Related Questions