Reputation: 749
I am having trouble counting the length of my String
which has some surrogate characters in it ?
my String is,
String val1 = "\u5B66\uD8F0\uDE30";
The problem is, \uD8F0\uDE30
is one character not two, so the length of the String
should be 2
.
but when I am calculating the length of my String
as val1.length()
it gives 3
as output, which is totally wrong. how can I fix the problem and get the actual length of the String
?
Upvotes: 4
Views: 1037
Reputation: 18763
You can use codePointCount(beginIndex, endIndex)
to count the number of code points in your String
instead of using length()
.
val1.codePointCount(0, val1.length())
See the following example,
String val1 = "\u5B66\uD8F0\uDE30";
System.out.println("character count: " + val1.length());
System.out.println("code points: "+ val1.codePointCount(0, val1.length()));
output
character count: 3
code points: 2
FYI, you cannot print individual surrogate characters from a String
using charAt()
either.
In order to print individual supplementary character from a String
use codePointAt
and offsetByCodePoints(index, codePointOffset)
, like this,
for (int i =0; i<val1.codePointCount(0, val1.length()); i++)
System.out.println("character at " + i + ": "+ val1.codePointAt(val1.offsetByCodePoints(0, i)));
}
gives,
character at 0: 23398
character at 1: 311856
You can use val1.codePoints()
, which returns an IntStream
of all code points in the sequence.
Since you are interested in length of your String
, use,
val1.codePoints().count();
to print code points,
val1.codePoints().forEach(a -> System.out.println(a));
Upvotes: 12