Reputation: 11117
I'm using PDO to connect to a MySQL database. In my connection string I have already added charset=utf8mb4
and all of my databases and tables are utf8mb4_unicode_ci
, But I'm facing a problem.
In order to search for entries based on their title on content
table I'm using the code below:
SELECT * FROM content WHERE title LIKE '%سيگنالها%'
the keyword is a Persian word. Now the above code returns 1 result which is correct and as expected.
But If I make a form in my PHP app and enter the SAME word either by using a macOS/Windows PC or by using an Android phone I get 0 results.
I tracked this issue down and it seems like even though the words entered by user look exactly the same as the one already in the database, they are in fact NOT the same.
According to this online tool, the decimal character code
for سيگنالها
it's: 1587, 1610, 1711, 1606, 1575, 1604, 1607, 1575
While
for سیگنالها
it's: 1587, 1740, 1711, 1606, 1575, 1604, 1607, 1575
Did you spot the difference? It's in bold. In fact if you copy both values and past them in here you will see the difference for yourself.
What can I do to solve this annoying problem? I'm using PHP 7 and MariaDB 10.1.
Upvotes: 1
Views: 234
Reputation: 1863
They are not the same character, even though they look the same when stringed together and might even have the same meaning.
The first string (1610) is ARABIC LETTER FARSI YEH[1] while the other (1740) is ARABIC LETTER YEH[2].
[1] https://en.wiktionary.org/wiki/%DB%8C [2] https://en.wiktionary.org/wiki/%D9%8A
I also created a simple form for PHP and tested both strings to see if the value sent through $_POST is kept. Result: the value isn't converted.
So what's probably going on is that you're using an Arabic keyboard to produce Farsi text. The recommended solution is some kind of normalization of the input.
See these discussions:
3) can't search in farsi text with arabic keyboard on iphone
Upvotes: 1
Reputation: 1052
Your first "ي" in the word "سيگنالها" is different character from second word "سیگنالها" which is "ی"
First ي: is ARABIC LETTER YEH (U+064A
)
Second ی: is ARABIC LETTER FARSI YEH (U+06CC
)
They are different in their Unicode entities, so that they are not match. Please see https://www.key-shortcut.com/en/writing-systems/%EF%BA%95%EF%BA%8F%D8%A2-arabic-alphabet/ for more information.
Upvotes: 1