Andrey Fedorov
Andrey Fedorov

Reputation: 9669

What does the COLLATE keyword do when creating a sqlite index?

According to the sqlite3 documentation,

The COLLATE clause following each column name defines a collating sequence used for text entries in that column. The default collating sequence is the collating sequence defined for that column in the CREATE TABLE statement. Or if no collating sequence is otherwise defined, the built-in BINARY collating sequence is used.

What does a collating sequence do, and what is a BINARY collating sequence?

Upvotes: 21

Views: 26698

Answers (3)

Wolf
Wolf

Reputation: 10238

Collation does not only affect ordering in indexes or results, but may also affect the validity of certain operations on tables. So you might find it appropriate to define a case-insensitive text column as primary key (or with UNIQUE constraint), like this:

CREATE TABLE t1 (
  id TEXT COLLATE NOCASE PRIMARY KEY
);

After inserting an id value 'a',

INSERT INTO t1 VALUES ('a');

it's not possible to add another id with value 'A' (differing just by case). The following SQL

INSERT INTO t1 VALUES ('A');

will fail with

Runtime error: UNIQUE constraint failed: t1.id (19)

It's nevertheless possible to change the id value from lowercase to uppercase:

UPDATE t1 SET id = 'A' WHERE id = 'A';

… which is quite interesting because the where clause actually works case-insensitively and the change is made from something A-like to a proper A.

Upvotes: 0

u07ch
u07ch

Reputation: 13702

It is the way that the sql engine orders data internally. Binary Collation does what it suggests, it does a binary comparison. Generally its the fastest collation though I have never quantified it, as it checks bit patterns, which means it is case and accent sensitive.

Upvotes: 9

live-love
live-love

Reputation: 52366

Binary collation compares your string byte by byte, as in an unicode table. For example: A,B,a,b. A case insensitive (NOCASE) order would be: a,A,b,B.

The advantage of binary collation is its speed, as string comparison is very simple/fast. In the general case, indexes with binary might not produce expected results for sort; however, for exact matches they can be useful.

COLLATE NOCASE also affects case sensitive queries.

If you have a column with these values: 'aa', 'aA'

select * from table where col = 'aa'

If you have created your column with COLLATE NOCASE it will return both 'aa' and 'aA'. Otherwise, if you didn't specify it, it will return only 'aa'.

You can also specify it in a query (this is slower then if you had created your column with COLLATE NOCASE)

select * from table where col = 'aa' COLLATE NOCASE

Upvotes: 8

Related Questions