Simon
Simon

Reputation: 23141

strtolower() for unicode/multibyte strings

I have some text in a non-English/foreign language in my page, but when I try to make it lowercase, it characters are converted into black diamonds containing question marks.

$a = "Երկիր Ավելացնել";
echo $b = strtolower($a);
//returns  ����� ���������

I've set my charset in a metatag, but this didn't fix it.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

What can I do to convert my string to lowercase without corrupting it?

Upvotes: 35

Views: 28103

Answers (8)

Kevin
Kevin

Reputation: 13226

Have you tried mb_strtolower() and specifying the encoding as the second parameter?

The examples on that page appear to work.

You could also try:

$str = mb_strtolower($str, mb_detect_encoding($str));

Upvotes: 5

SteelBytes
SteelBytes

Reputation: 6965

Have you tried using mb_strtolower()?

Upvotes: 77

khaled_webdev
khaled_webdev

Reputation: 1430

i have found this solution from here

$string = 'Թ';
echo 'Uppercase: '.mb_convert_case($string, MB_CASE_UPPER, "UTF-8").'';
echo 'Lowercase: '.mb_convert_case($string, MB_CASE_LOWER, "UTF-8").'';
echo 'Original: '.$string.'';

works for me (lower case)

Upvotes: 10

SWilk
SWilk

Reputation: 3464

Php by default does not know about utf-8. It assumes any string is ASCII, so it strtolower converts bytes containing codes of uppercase letters A-Z to codes of lowercase a-z. As the UTF-8 non-ascii letters are written with two or more bytes, the strtolower converts each byte separately, and if the byte happens to contain code equal to letters A-Z, it is converted. In the result the sequence is broken, and it no longer represents correct character.

To change this you need to configure the mbstring extension:

http://www.php.net/manual/en/book.mbstring.php

to replace strtolower with mb_strtolower or use mb_strtolower direclty. I any case, you need to spend some time to configure the mbstring settings to match your requirements.

Upvotes: 3

intuited
intuited

Reputation: 24044

You will need to set the locale; see the first example at https://www.php.net/manual/en/function.strtolower.php

Upvotes: 0

reko_t
reko_t

Reputation: 56430

PHP5 is not UTF-8 compatible, so you still need to resort to the mb extension. I suggest you set the internal encoding of mb to utf-8 and then you can freely use its functions without specifying the charset all the time:

mb_internal_encoding('UTF-8');

...

$b = mb_strtolower($a);
echo $b;

Upvotes: 22

Pekka
Pekka

Reputation: 449495

strtolower() will perform the conversion in the currently selected locale only.

I would try mb_convert_case(). Make sure you explicitly specify an encoding.

Upvotes: 1

Powerlord
Powerlord

Reputation: 88796

Use mb_strtolower instead, as strtolower doesn't work on multi-byte characters.

Upvotes: 2

Related Questions