Reputation: 997
What collation is available to H2 Database that does not ignore spaces but at the same time recognizes characters with umlauts and without as the same?
For example, it should treat "Ilkka Seppälä" and "Ilkka Seppala" as the same. It also needs to treat "MSaifAsif" and "M Saif Asif" as different (because of the spaces)
Upvotes: 1
Views: 2132
Reputation: 997
I found the answer to my question. To get my desired outcome to work, I had to do two things:
add icu4j as a dependency to the project which made H2 use the ICU4J collator.
testCompile 'com.ibm.icu:icu4j:55.1'
This is mentioned in the documentation H2 DB Reference - SET COLLATION . (It does not explain though the difference between the default collator and ICU4J's.
Add SET COLLATION ENGLISH STRENGTH PRIMARY to the JDBC url:
jdbc:h2:mem:test;MODE=MySQL;INIT=CREATE SCHEMA IF NOT EXISTS "public"\;SET COLLATION ENGLISH STRENGTH PRIMARY
A snippet of my unit test which works after adding ICU4J:
@Test
public void testUnicode() throws Exception {
Author authorWithUnicode = new Author();
authorWithUnicode.setName("Ilkka Seppälä");
authorRepository.save(authorWithUnicode);
Author authorWithSpaces = new Author();
authorWithSpaces.setName("M Saif Asif");
authorRepository.save(authorWithSpaces);
assertThat(authorRepository.findByName("Ilkka Seppälä").get()).isNotNull();
assertThat(authorRepository.findByName("Ilkka Seppala").get()).isNotNull();
assertThat(authorRepository.findByName("M Saif Asif").get()).isNotNull();
assertThat(authorRepository.findByName("MSaifAsif")).isEqualTo(Optional.empty());
}
Previously, without ICU4J, if H2 was initialized with SET COLLATION ENGLISH STRENGTH PRIMARY, the 4th assert would fail because it would treat the String with spaces as the same with the one without spaces. Without SET COLLATION, the second assert would fail because it would treat the name with letter "a" with umlaut as different from the one without.
Upvotes: 3