Reputation: 309
I am trying to filter Turkish names from MySql database through AJAX POST, the English letter words are listing all okay however if I send Ö (which is letter O with dots) the results come for both O and Ö not only Ö
Also what I noticed is the AJAX post is send Ö as %C3%96, anybody can help?
Upvotes: 4
Views: 1362
Reputation: 142298
The PHP code should be receiving %C3%96
suitably decoded back to Ö
. But if not, then apply the PHP function urldecode()
to the string.
You will still have the character Ö
, not O
; is that OK?
If you get Ö
, then there is a mixture of utf8 and latin1. That is a different problem.
Upvotes: 0
Reputation: 96159
Please bare my somewhat lengthy response.
Let's start with your second question. %C3%96
means that the bytes 0xC3 and 0x96 are transmitted. Those two bytes encode the character Ö
in utf-8.
From this (and that your query yields the described results) I assume that you're using utf-8 all the way through.
The lexicographical order of characters of a given charset is determined by the collation used.
That's more or less an ordered list of characters. E.g. A,B,C,D,.... meaning A<B<C
....
But these lists my contain multiple characters in the same "location", e.g.
[A,Ä],B,C,D.... meaning that A==Ä->true
___ excursion, not immediately relevant to your question ____
Let's take a look at the "name" of the character Ö
, it's LATIN CAPITAL LETTER O WITH DIAERESIS
.
So, the base character is O, it just has some decoration(s).
Some systems/libraries allow you to specify the "granularity"/level/strength of the comparison, see e.g. Collator::setStrength of the php-intl extension.
<?php
// utf8 characters
define('SMALL_O_WITH_DIAERESIS', chr(0xC3) . chr(0xB6));
define('CAP_O_WITH_DIAERESIS', chr(0xC3) . chr(0x96));
$coll = collator_create( 'utf-8' );
foreach( array('PRIMARY', 'SECONDARY', 'TERTIARY') as $strength) {
echo $strength, "\r\n";
$coll->setStrength( constant('Collator::'.$strength) );
echo ' o ~ ö = ', $coll->compare('o', SMALL_O_WITH_DIAERESIS), "\r\n";
echo ' Ö ~ ö = ', $coll->compare(CAP_O_WITH_DIAERESIS, SMALL_O_WITH_DIAERESIS), "\r\n";
}
prints
PRIMARY
o ~ ö = 0
Ö ~ ö = 0
SECONDARY
o ~ ö = -1
Ö ~ ö = 0
TERTIARY
o ~ ö = -1
Ö ~ ö = 1
On the primary level all the involved characters (o,O,ö,Ö) are just some irrelevant variations of the character O, so all are regarded as equal.
On the secondary level the additional "feature" WITH DIAERESIS
is taken into consideration and on the third level also whether it is a small or a capital letter.
But ...MySQL doesn't exactly work that way ...so, sorry again ;-)
___ end of excursion ____
In MySQL there are collation tables that specify the order. When you select a charset you also implictly select the default collation for that charset, unless you explictly specify one. In your case the implictly selected collation is probably utf8_general_ci and it treats ö==o.
This applies to both the table defintion and charset/collation of the connection (the latter being almost irrelevant in your case).
utf8_turkish_ci on the other hand treats ö!=o. That's probably the collation you want.
When you have a table defintion like
CREATE TABLE soFoo (
x varchar(32)
)
CHARACTER SET utf8
the default collation for utf8 is chosen -> general_ci -> o=ö
You can specifiy the default collation for the table when defining it
CREATE TABLE soFoo (
x varchar(32)
)
CHARACTER SET utf8 COLLATE utf8_turkish_ci
Since you already have a table plus data, you can change the collation of the table ...but if you do it on the table level you have to use ALTER TABLE ... CONVERT
(in case you use MODIFY, the column keeps its "original" collation).
ALTER TABLE soFoo CONVERT TO CHARACTER SET utf8 COLLATE utf8_turkish_ci
That should pretty much take care of your problem.
As a side note there is (as mentioned) a collation assigned to your connection as well. Selecting a charset means selecting a collation. I use mainly PDO when (directly) connecting to MySQL and my default connection code looks like this
$pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array(
PDO::ATTR_EMULATE_PREPARES=>false,
PDO::MYSQL_ATTR_DIRECT_QUERY=>false,
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION
));
note the charset=utf8
; no collation, so again general_ci is assigned to the connection. And that's why
<?php
$pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array(
PDO::ATTR_EMULATE_PREPARES=>false,
PDO::MYSQL_ATTR_DIRECT_QUERY=>false,
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION
));
$smallodiaresis_utf8 = chr(0xC3) . chr(0xB6);
foreach( $pdo->query("SELECT 'o'='$smallodiaresis_utf8'") as $row ) {
echo $row[0];
}
prints 1
meaning o==ö. The string literals used in the statement are treated as utf8/utf8_general_ci.
I could either specify the collation for the string literal explicitly in the statement
SELECT 'o' COLLATE utf8_turkish_ci ='ö'
(only setting it for one of the two literals/operands; for why and how this works see Collation of Expressions)
or I can set the connection collation via
$pdo->exec("SET collation_connection='utf8_turkish_ci'");
both result in
foreach( $pdo->query("SELECT 'o'[...]='$smallodiaresis_utf8'") as $row ) {
echo $row[0];
}
printing 0
.
edit: and to complicate things even a bit further:
The charset utf8
can't represent all possible characters. There's an even broader character set utf8mb4
.
Upvotes: 2