andreih
andreih

Reputation: 433

Firebird 2.5.6 romanian collation for UTF8 charset

I have a table in Firebird 2.5.6. It has the UTF8 charset and contains Romanian characters. However, when I want to order the characters by Romanian logic, the result is wrong.

An example of this issue can be found below:

CREATE TABLE TEST (
  ID INTEGER DEFAULT 0 NOT NULL,
  TXT VARCHAR(20) CHARACTER SET UTF8 DEFAULT '-' NOT NULL COLLATE UTF8,
  PRIMARY KEY (ID)
);

Assuming I insert the following rows:

INSERT INTO TEST (ID, TXT) VALUES (1, 'S');
INSERT INTO TEST (ID, TXT) VALUES (2, 'SA');
INSERT INTO TEST (ID, TXT) VALUES (3, 'Ș');
INSERT INTO TEST (ID, TXT) VALUES (4, 'ȘA');
INSERT INTO TEST (ID, TXT) VALUES (5, 'T');
INSERT INTO TEST (ID, TXT) VALUES (6, 'TA');
INSERT INTO TEST (ID, TXT) VALUES (7, 'Ț');
INSERT INTO TEST (ID, TXT) VALUES (8, 'ȚA');

When I run the following query, the result is not what i would expect.

select id, txt from test order by txt

returns:

ID  TXT
1   S
2   SA
5   T
6   TA
3   Ș
4   ȘA
7   Ț
8   ȚA

But the correct ordering for Romanian is:

ID  TXT
1   S
2   SA
3   Ș
4   ȘA
5   T
6   TA
7   Ț
8   ȚA

Can this be done without changing the charset from UTF8 ?

Upvotes: 2

Views: 1364

Answers (2)

ddmmaaxx
ddmmaaxx

Reputation: 1

By the Firebird 2.5 Language Reference, you have to use character set UTF8 with collation UTF8, which supports all the languages supported by the Unicode 4.0.

If your db is for use only in your country, you better adopt character set ISO8859_2 with the same-named collation, being only 1byte-per-character wide instead of 4 bytes for UTF8.

Upvotes: 0

Rick James
Rick James

Reputation: 142296

(This answer may not apply; it refers to MySQL collations.)

Unicode defines a general collation that tends to be "wrong" in some way for every language. You have hit such.

Switch to utf8_romanian_ci to the the ordering you desire. That COLLATION has been available in MySQL for, I think, at least a decade.

Note the distinction between these two letters: Ș and Ş

C59E        350=x015E  [Ş]    L  LATIN CAPITAL LETTER S WITH CEDILLA
C898        536=x0218  [Ș]    L  LATIN CAPITAL LETTER S WITH COMMA BELOW

I point out here that utf8_romanian_ci (also polish and turkish) correctly sorts Ş, but my list fails to list the other flavor of S.

Upvotes: 0

Related Questions