Jerin Monish
Jerin Monish

Reputation: 115

Replace illegal charactes in a text by underscore in PHP

i need to replace the illegal characters by underscore(_), For Example: if user given text is "imageЙ ййé.png" need to replace this Й йй characters by _ __ So the overall output must be image_ __é.png. And this replacing must not occur for french characters. I have worked check the below code and help me to get the output.

<?php
$allowed_char_array=array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ñ","ò","ó","ô","õ","ö","ð","ø","œ","š","Þ","ù","ú","û","ü","ý","ÿ","ž","0","1","2","3","4","5","6","7","8","9"," ","(",")","-","_",".","@","#","$","%","*","¢","ß","¥","£","™","©","®","ª","×","÷","±","+","-","²","³","¼","½","¾","µ","¿","¶","·","¸","º","°","¯","§","…","¤","¦","≠","¬","ˆ","¨","‰");
$word = 'imageЙ ййé.png';
$file_name = url_rewrite(trim($word));
$file_name2 = strtolower($file_name);
$split = str_split($file_name2);

if(is_array($split) && is_array($allowed_char_array)){
	$result=array_diff($split,$allowed_char_array);
	echo '<pre>';
	print_r($split);
	echo '<pre>';
	print_r($allowed_char_array);
	echo '<pre>';
	print_r($result);
}
function url_rewrite($chaine) {

    // On va formater la chaine de caractère
    // On remplace pour ne plus avoir d'accents
    $accents = array('é','à','è','À','É','È');
    $sans =    array('é','à','è','À','É','È');
    $chaine = str_replace($accents, $sans, $chaine);

    
    return $chaine;
}
?>

Upvotes: 2

Views: 82

Answers (2)

mickmackusa
mickmackusa

Reputation: 47894

You will want to use mb_strtolower() to convert multibyte characters to lowercase safely.

My solution uses strtr() to convert your French accented letters to your preferred form.

Since all characters are lowercased from the onset, you can halve your white list of French characters.

Using pathinfo() helps you to dissect your filename.

Code: (Demo)

$word = 'imageЙ ййé.png';
$parts = pathinfo($word);
$filename = strtr(mb_strtolower($parts['filename']), ['é' =>'é', 'à' => 'à','è' => 'è']);
echo preg_replace('~[^ a-zéàè]~u', '_', $filename) , "." , $parts['extension'];

Output:

image_ __é.png

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521178

I would build a regex (character class, to be exact) using your whitelisted characters, and then remove any character which matches the negation of that class.

$allowed_char_array = array("a","b","c","d","e") // and others
$chars = implode("", $allowed_char_array);
$regex = "/[^" . $chars . "]/u";
$input = "imageЙ ййé.png";
echo $regex . "\n";
$output = preg_replace($regex, "_", $input);
echo $input . "\n" . $output;

imageЙ ййé.png
image_ __é.png

If the above be not clear, here is what the actual all to preg_replace would look like:

preg_replace("/[^abcdefghijklmnopqrstuv]/u, "_", $input);

That is, any non whitelisted character would be replaced with just underscore. I did not bother to list out the entire character class, because you already have that in your source code.

Note that the /u flag in the regex is critical here, because your input string is a UTF-8 string. UTF-8 characters may consist of more than one byte, and using preg_replace on them without /u may have unexpected results.

Upvotes: 2

Related Questions