Reputation: 339
I have a Java EE project using PostgreSQL 9.X and JPA2 (Hibernate implementation). How can I force a like query to be case insensitive and accent insensitive?
I'm able to change the charset of the DB because it's the first project using it.
Upvotes: 5
Views: 18594
Reputation: 21
I had this issue, and I couldn't use database functions. So instead I used a REGEX restriction in my criteria code:
searchText = unaccent(searchText);
String expression = "firstName ~* '.*" + searchText + ".*'";
Criterion searchCriteria = Restrictions.sqlRestriction(expression);
Then I wrote a function called unaccent to change each character to a or-statement, for example any letter e will become (e|é|è). A query for "hello" will become "h(e|é|è)llo".
Here is the function inspired from this thread Postgres accent insensitive LIKE search in Rails 3.1 on Heroku
private String unaccent(String text) {
String String charactersProcessed = ""; // To avoid doing a replace multiple times.
String newText = text.toLowerCase();
text = newText; // Case statement is expecting lowercase.
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
if (charactersProcessed.contains(c + "")) {
continue; // We have already processed this character.
}
String replacement = "";
switch (c) {
case '1': {
replacement = "¹";
break;
}
case '2': {
replacement = "²";
break;
}
case '3': {
replacement = "³";
break;
}
case 'a': {
replacement = "á|à|â|ã|ä|å|ā|ă|ą|À|Á|Â|Ã|Ä|Å|Ā|Ă|Ą|Æ";
break;
}
case 'c': {
replacement = "ć|č|ç|©|Ć|Č|Ç";
break;
}
case 'd': {
replacement = "Đ|Ð";
break;
}
case 'e': {
replacement = "è|é|ê|ё|ë|ē|ĕ|ė|ę|ě|È|Ê|Ë|Ё|Ē|Ĕ|Ė|Ę|Ě|€";
break;
}
case 'g': {
replacement = "ğ|Ğ";
break;
}
case 'i': {
replacement = "ı|ì|í|î|ï|ì|ĩ|ī|ĭ|Ì|Í|Î|Ï|Ї|Ì|Ĩ|Ī|Ĭ";
break;
}
case 'l': {
replacement = "ł|Ł";
break;
}
case 'n': {
replacement = "ń|ň|ñ|Ń|Ň|Ñ";
break;
}
case 'o': {
replacement = "ò|ó|ô|õ|ö|ō|ŏ|ő|ø|Ò|Ó|Ô|Õ|Ö|Ō|Ŏ|Ő|Ø|Œ";
break;
}
case 'r': {
replacement = "ř|®|Ř";
break;
}
case 's': {
replacement = "š|ş|ș|ß|Š|Ş|Ș";
break;
}
case 'u': {
replacement = "ù|ú|û|ü|ũ|ū|ŭ|ů|Ù|Ú|Û|Ü|Ũ|Ū|Ŭ|Ů";
break;
}
case 'y': {
replacement = "ý|ÿ|Ý|Ÿ";
break;
}
case 'z': {
replacement = "ž|ż|ź|Ž|Ż|Ź";
break;
}
}
if (!replacement.isEmpty()) {
charactersProcessed = charactersProcessed + c;
newText = newText.replace(c + "", "(" + c + "|" + replacement + ")");
}
}
return newText;
}
Upvotes: 0
Reputation: 125214
If the unaccent extension is installed:
select unaccent(lower('ãóÊ'));
unaccent
----------
aoe
Upvotes: 3
Reputation: 324385
In general there is no standard way to write "accent-insensitive" code, or to compare words for equality while ignoring accents. The whole idea makes very little sense, as different accented characters mean different things in different languages/dialects, and their "plain ascii" substitutions/expansions vary by language. Please don't do this; resume
and résumé
are different words, and the situation gets even worse when considering any language(s) other than English.
For case-insensitivity you can use lower(the_col) like lower('%match_expression')
in JPQL. As far as I know ilike
isn't supported in JPQL, but I have not checked the standard to verify this. It's fairly readable, so consider just downloading the JPA2 spec and reading it. JPA2 Criteria offers Restrictions.ilike
for the purpose. Neither will normalize/strip/ignore accented characters.
For stripping accents, etc, you will probably need to use database-engine specific stored functions or native queries. See, eg this prior answer, or if you intended to substitute accented characters with an unaccented alternative this PostgreSQL wiki entry - but again, please don't do this except for very limited purposes like finding places where words may've been "unaccented" by misguided software or users.
Upvotes: 5