bhuvesh
bhuvesh

Reputation: 749

length of a String with surrogate characters in it - java

I am having trouble counting the length of my String which has some surrogate characters in it ?

my String is,

String val1 = "\u5B66\uD8F0\uDE30";

The problem is, \uD8F0\uDE30 is one character not two, so the length of the String should be 2.

but when I am calculating the length of my String as val1.length() it gives 3 as output, which is totally wrong. how can I fix the problem and get the actual length of the String?

Upvotes: 4

Views: 1037

Answers (1)

Sufiyan Ghori
Sufiyan Ghori

Reputation: 18763

You can use codePointCount(beginIndex, endIndex) to count the number of code points in your String instead of using length().

val1.codePointCount(0, val1.length())

See the following example,

String val1 = "\u5B66\uD8F0\uDE30";
System.out.println("character count: " + val1.length());
System.out.println("code points: "+ val1.codePointCount(0, val1.length()));

output

character count: 3
code points: 2

FYI, you cannot print individual surrogate characters from a String using charAt() either. In order to print individual supplementary character from a String use codePointAt and offsetByCodePoints(index, codePointOffset), like this,

for (int i =0; i<val1.codePointCount(0, val1.length()); i++)
        System.out.println("character at " + i + ": "+ val1.codePointAt(val1.offsetByCodePoints(0, i)));
    }

gives,

character at 0: 23398
character at 1: 311856

for Java 8

You can use val1.codePoints(), which returns an IntStream of all code points in the sequence.

Since you are interested in length of your String, use,

val1.codePoints().count();

to print code points,

val1.codePoints().forEach(a -> System.out.println(a));

Upvotes: 12

Related Questions