Do text_pattern_ops comparators understand UTF-8?

Question

According to the PostgreSQL 9.2 documentation, if I am using a locale other than the C locale (en_US.UTF-8 in my case), btree indexes on text columns for supporting queries like

SELECT * from my_table WHERE text_col LIKE 'abcd%'

need to be created using text_pattern_ops like so

CREATE INDEX my_idx ON my_table (text_col text_pattern_ops)

Now section 11.9 of the documentation states that this results in a "character by character" comparison. Are these (non-wide) C characters or does the comparison understand UTF-8?

leonbloy · Accepted Answer

Good question, I'm not totally sure but my tentative understanding is:

Here Postgresql means "real characters" (eventually multibyte), not bytes. The comparison "understands UTF-8" always, with or without this special index.

The point is that, for locales that have special (non C) collation rules, we normally want to follow those rules (and call the respective locale libraries) when doing comparisons ( <, >...) and sorting. But we don't want to use those collations for POSIX regular matching and LIKE patterns. Hence the existence of two different types of indexes for text.

Do text_pattern_ops comparators understand UTF-8?

Answers (2)

Related Questions