Saqib Ali
Saqib Ali

Reputation: 4448

Detecting a utf8mb4 charset requirement

We have a mySQL DB that only supports utf8. But we are getting some data feeds that require utf8mb4 for storing in mySQL. How can we detect (in Java) if a string will require utf8mb4 charset?

Upvotes: 2

Views: 2514

Answers (1)

Joni
Joni

Reputation: 111349

Characters that require utf8mb4 are represented as a surrogate pair in Java, and occupy 2 chars. A simple way to detect them is therefore checking if the length of the string in chars is the same as the number of code points:

boolean requiresMb4(String s) {
    int len = s.length();
    return len != s.codePointCount(0, len);
}

Upvotes: 6

Related Questions