Reputation: 13
i have this formula for checking if a name is correct and removing non letters. however if my name have å, å or ö in the name it will remove that letter since its not a part of the english alphabet
<?php
//mb_internal_encoding('UTF-8');
function ValidName($namn = NULL, $efternamn = NULL){
if(isset($namn)) {
$namn = preg_replace('/[^A-Za-z]/', '', $namn);
return $namn;
}
if(isset($efternamn)) {
$efternamn = preg_replace('/\P{L}+/', '', $efternamn);
return $efternamn;
}
}
?>
i tried adding the u after / to allow unicode letters, but then it just stole the entire name. ive got this row in the file aswell: mb_internal_encoding('UTF-8');
so how would i keep the name correctly but lose dots, commas, numbers and everything you cant be named?
proper name: hellström
after my formula has been used: hellstrm
any help is appreciated
Upvotes: 0
Views: 895
Reputation: 56809
Before you proceed, obligatory article Falsehoods Programmers Believe About Names. It's best to allow user to put anything for their names (unless it is some system where the real name of the user is compulsory, and the name is later matched with normalized database).
Back to the problem, there are two ways to represent ö
, ö
(U+00F6) as single code point or ö
as two code points (o
and combining diaeresis U+0308)
When you want to allow letter in any language, it is necessary to allow all characters in Letter and Mark categories:
$efternamn = preg_replace('/[^\p{L}\p{M}]+/', '', $efternamn);
This method is quite crude, since it doesn't check whether the combining marks are placed properly or not.
If the regex above doesn't work, try adding u
modifier to the regex above.
Upvotes: 0
Reputation: 785146
You can use unicode property \P{L}
to detect all unicode non letters:
$efternamn = preg_replace('/\P{L}+/', '', $efternamn);
Upvotes: 3