Reputation: 85
According to the PostgreSQL 9.2 documentation, if I am using a locale other than the C locale (en_US.UTF-8 in my case), btree indexes on text columns for supporting queries like
SELECT * from my_table WHERE text_col LIKE 'abcd%'
need to be created using text_pattern_ops
like so
CREATE INDEX my_idx ON my_table (text_col text_pattern_ops)
Now section 11.9 of the documentation states that this results in a "character by character" comparison. Are these (non-wide) C characters or does the comparison understand UTF-8?
Upvotes: 4
Views: 1087
Reputation: 36729
The operators in the text_pattern_ops
operator class actually do a memcmp()
on the strings, so the documentation is perhaps slightly inaccurate talking about characters.
But this doesn't really affect the question whether they support UTF-8. The indexing of pattern matching operations in the described fashion does support UTF-8. The underlying operators don't have to worry about the encoding.
Upvotes: 2
Reputation: 75906
Good question, I'm not totally sure but my tentative understanding is:
Here Postgresql means "real characters" (eventually multibyte), not bytes. The comparison "understands UTF-8" always, with or without this special index.
The point is that, for locales that have special (non C) collation rules, we normally want to follow those rules (and call the respective locale libraries) when doing comparisons ( <
, >
...) and sorting. But we don't want to use those collations for POSIX regular matching and LIKE patterns. Hence the existence of two different types of indexes for text.
Upvotes: 2