user1180339
user1180339

Reputation: 339

How to do a like case-insensitive and accent insensitive in PostgreSQL and JPA 2?

I have a Java EE project using PostgreSQL 9.X and JPA2 (Hibernate implementation). How can I force a like query to be case insensitive and accent insensitive?

I'm able to change the charset of the DB because it's the first project using it.

Upvotes: 5

Views: 18594

Answers (3)

motus
motus

Reputation: 21

I had this issue, and I couldn't use database functions. So instead I used a REGEX restriction in my criteria code:

searchText = unaccent(searchText);
String expression = "firstName ~* '.*" + searchText + ".*'";
Criterion searchCriteria = Restrictions.sqlRestriction(expression);

Then I wrote a function called unaccent to change each character to a or-statement, for example any letter e will become (e|é|è). A query for "hello" will become "h(e|é|è)llo".

Here is the function inspired from this thread Postgres accent insensitive LIKE search in Rails 3.1 on Heroku

private String unaccent(String text) {
    String String charactersProcessed = ""; // To avoid doing a replace multiple times.
    String newText = text.toLowerCase();
    text = newText; // Case statement is expecting lowercase.
    for (int i = 0; i < text.length(); i++) {
        char c = text.charAt(i);
        if (charactersProcessed.contains(c + "")) {
            continue; // We have already processed this character.
        }
        String replacement = "";
        switch (c) {
        case '1': {
            replacement = "¹";
            break;
        }
        case '2': {
            replacement = "²";
            break;
        }
        case '3': {
            replacement = "³";
            break;
        }
        case 'a': {
            replacement = "á|à|â|ã|ä|å|ā|ă|ą|À|Á|Â|Ã|Ä|Å|Ā|Ă|Ą|Æ";
            break;
        }
        case 'c': {
            replacement = "ć|č|ç|©|Ć|Č|Ç";
            break;
        }
        case 'd': {
            replacement = "Đ|Ð";
            break;
        }
        case 'e': {
            replacement = "è|é|ê|ё|ë|ē|ĕ|ė|ę|ě|È|Ê|Ë|Ё|Ē|Ĕ|Ė|Ę|Ě|€";
            break;
        }
        case 'g': {
            replacement = "ğ|Ğ";
            break;
        }
        case 'i': {
            replacement = "ı|ì|í|î|ï|ì|ĩ|ī|ĭ|Ì|Í|Î|Ï|Ї|Ì|Ĩ|Ī|Ĭ";
            break;
        }
        case 'l': {
            replacement = "ł|Ł";
            break;
        }
        case 'n': {
            replacement = "ń|ň|ñ|Ń|Ň|Ñ";
            break;
        }
        case 'o': {
            replacement = "ò|ó|ô|õ|ö|ō|ŏ|ő|ø|Ò|Ó|Ô|Õ|Ö|Ō|Ŏ|Ő|Ø|Œ";
            break;
        }
        case 'r': {
            replacement = "ř|®|Ř";
            break;
        }
        case 's': {
            replacement = "š|ş|ș|ß|Š|Ş|Ș";
            break;
        }
        case 'u': {
            replacement = "ù|ú|û|ü|ũ|ū|ŭ|ů|Ù|Ú|Û|Ü|Ũ|Ū|Ŭ|Ů";
            break;
        }
        case 'y': {
            replacement = "ý|ÿ|Ý|Ÿ";
            break;
        }
        case 'z': {
            replacement = "ž|ż|ź|Ž|Ż|Ź";
            break;
        }
        }
        if (!replacement.isEmpty()) {
            charactersProcessed = charactersProcessed + c;
            newText = newText.replace(c + "", "(" + c + "|" + replacement + ")");
        }
    }

    return newText;
}

Upvotes: 0

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

If the unaccent extension is installed:

select unaccent(lower('ãóÊ'));
 unaccent 
----------
 aoe

Upvotes: 3

Craig Ringer
Craig Ringer

Reputation: 324385

In general there is no standard way to write "accent-insensitive" code, or to compare words for equality while ignoring accents. The whole idea makes very little sense, as different accented characters mean different things in different languages/dialects, and their "plain ascii" substitutions/expansions vary by language. Please don't do this; resume and résumé are different words, and the situation gets even worse when considering any language(s) other than English.

For case-insensitivity you can use lower(the_col) like lower('%match_expression') in JPQL. As far as I know ilike isn't supported in JPQL, but I have not checked the standard to verify this. It's fairly readable, so consider just downloading the JPA2 spec and reading it. JPA2 Criteria offers Restrictions.ilike for the purpose. Neither will normalize/strip/ignore accented characters.

For stripping accents, etc, you will probably need to use database-engine specific stored functions or native queries. See, eg this prior answer, or if you intended to substitute accented characters with an unaccented alternative this PostgreSQL wiki entry - but again, please don't do this except for very limited purposes like finding places where words may've been "unaccented" by misguided software or users.

Upvotes: 5

Related Questions