Reputation: 2866
strlen($username);
Username can carry ASCII, Unicode or both.
Example:
Jam123 (ASCII) - 6 characters
ابت (Unicode) - 3 characters but strlen returns 6 bytes as unicode is 2 bytes per char.
Jamت (Unicode and ASCII) - 5 characters (3 ASCII and 2 Unicode even though i have only one unicode character)
Username in all cases shouldn't go beyond 25 characters and shouldn't be less than 4 chars.
My main problem is when mixing Unicode and ASCII together, how can i keep track of count so the condition statement can deicde whether username is not over 25 and not less than 4.
if(strlen($username) <= 25 && !(strlen($username) < 4))
3 characters in unicode will be counted as 6 bytes which causes trouble because it allows user to have a username of 3 unicode characters when the characters should be minimum of 4.
Numbers will always be in ASCII
Upvotes: 2
Views: 4070
Reputation: 56341
function to count words in UNICODE sentence/string:
function mb_count_words($string)
{
preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches); return count($matches[0]);
}
or
function mb_count_words($string, $format = 0, $charlist = '[]') {
$string=trim($string);
if(empty($string))
$words = array();
else
$words = preg_split('~[^\p{L}\p{N}\']+~u',$string);
switch ($format) {
case 0:
return count($words);
break;
case 1:
case 2:
return $words;
break;
default:
return $words;
break;
}
}
then do:
echo mb_count_words("chào buổi sáng");
Upvotes: 0
Reputation: 50966
You can use mb_strlen where you select your encoding.
http://sandbox.phpcode.eu/g/3a144/1
<?php
echo mb_strlen('ابت', 'UTF8'); // returns 3
Upvotes: 1
Reputation: 99879
Use mb_strlen()
. It takes care of unicode characters.
Example:
mb_strlen("Jamت", "UTF-8"); // 4
Upvotes: 6