Reputation: 38033
I have to work with codepoints above 0FFFF
(specifically math scripted characters) and have not found simple tutorials on how to do this. I want to be able to (a) create String
s with high codepoints and (b) iterate over the characters in them. Since char
cannot hold these points my code looks like:
@Test
public void testSurrogates() throws IOException {
// creating a string
StringBuffer sb = new StringBuffer();
sb.append("a");
sb.appendCodePoint(120030);
sb.append("b");
String s = sb.toString();
System.out.println("s> "+s+" "+s.length());
// iterating over string
int codePointCount = s.codePointCount(0, s.length());
Assert.assertEquals(3, codePointCount);
int charIndex = 0;
for (int i = 0; i < codePointCount; i++) {
int codepoint = s.codePointAt(charIndex);
int charCount = Character.charCount(codepoint);
System.out.println(codepoint+" "+charCount);
charIndex += charCount;
}
}
I don't feel comfortable that this is either fully correct or the cleanest way to do this. I would have expected methods such as codePointAfter()
but there is only a codePointBefore()
. Please confirm that this is the right strategy or give an alternate one.
UPDATE: Thanks for the confirmation @Jon. I struggled with this - here are two mistakes to avoid:
s.getCodePoint(i))
- you have to iterate through them (char)
as a cast will truncate integers above 0FFFF
and it's not easy to spotUpvotes: 4
Views: 681
Reputation: 1499730
It looks correct to me. If you want to iterate over the code points in a string, you could wrap this code in an Iterable
:
public static Iterable<Integer> getCodePoints(final String text) {
return new Iterable<Integer>() {
@Override public Iterator<Integer> iterator() {
return new Iterator<Integer>() {
private int nextIndex = 0;
@Override public boolean hasNext() {
return nextIndex < text.length();
}
@Override public Integer next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
int codePoint = text.codePointAt(nextIndex);
nextIndex += Character.charCount(codePoint);
return codePoint;
}
@Override public void remove() {
throw new UnsupportedOperationException();
}
};
}
};
}
Or you could change the method to just return an int[]
of course:
public static int[] getCodePoints(String text) {
int[] ret = new int[text.codePointCount(0, text.length())];
int charIndex = 0;
for (int i = 0; i < ret.length; i++) {
ret[i] = text.codePointAt(charIndex);
charIndex += Character.charCount(ret[i]);
}
return ret;
}
I agree that it's a pity that the Java libraries don't expose methods like this already, but at least they're not too hard to write.
Upvotes: 5