user311509
user311509

Reputation: 2866

Count Number of Characters in a Mixed String of ASCII and Unicode

strlen($username);

Username can carry ASCII, Unicode or both.

Example:

Jam123 (ASCII) - 6 characters
ابت (Unicode) - 3 characters but strlen returns 6 bytes as unicode is 2 bytes per char.
Jamت (Unicode and ASCII) - 5 characters (3 ASCII and 2 Unicode even though i have only one unicode character)

Username in all cases shouldn't go beyond 25 characters and shouldn't be less than 4 chars.

My main problem is when mixing Unicode and ASCII together, how can i keep track of count so the condition statement can deicde whether username is not over 25 and not less than 4.

if(strlen($username) <= 25 && !(strlen($username) < 4))

3 characters in unicode will be counted as 6 bytes which causes trouble because it allows user to have a username of 3 unicode characters when the characters should be minimum of 4.

Numbers will always be in ASCII

Upvotes: 2

Views: 4070

Answers (3)

T.Todua
T.Todua

Reputation: 56341

function to count words in UNICODE sentence/string:

function mb_count_words($string) 
{
    preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches);  return count($matches[0]);
}

or

function mb_count_words($string, $format = 0, $charlist = '[]') {
    $string=trim($string);
    if(empty($string))
        $words = array();
    else
        $words = preg_split('~[^\p{L}\p{N}\']+~u',$string);
    switch ($format) {
        case 0:
            return count($words);
            break;
        case 1:
        case 2:
            return $words;
            break;
        default:
            return $words;
            break;
    }
}


then do:

echo mb_count_words("chào buổi sáng");

Upvotes: 0

genesis
genesis

Reputation: 50966

You can use mb_strlen where you select your encoding.

http://sandbox.phpcode.eu/g/3a144/1

<?php 
echo mb_strlen('ابت', 'UTF8'); // returns 3

Upvotes: 1

Arnaud Le Blanc
Arnaud Le Blanc

Reputation: 99879

Use mb_strlen(). It takes care of unicode characters.

Example:

mb_strlen("Jamت", "UTF-8"); // 4

Upvotes: 6

Related Questions