Reputation: 333
I'm trying to extract emojis and other special Characters from Strings for further processing (e.g. a String contains '😅' as one of its Characters).
But neither string.charAt(i)
nor string.substring(i, i+1)
work for me. The original String is formatted in UTF-8 and this means, that the escaped form of the above emoji is encoded as '\uD83D\uDE05'. That's why I receive '?' (\uD83D) and '?' (\uDE05) instead for this position, causing it to be at two positions when iterating over the String.
Does anyone have a solution to this problem?
Upvotes: 3
Views: 1563
Reputation: 333
Thanks to John Kugelman for the help. the solution looks like this now:
for(int codePoint : codePoints(string)) {
char[] chars = Character.toChars(codePoint);
System.out.println(codePoint + " : " + String.copyValueOf(chars));
}
With the codePoints(String string)-method looking like this:
private static Iterable<Integer> codePoints(final String string) {
return new Iterable<Integer>() {
public Iterator<Integer> iterator() {
return new Iterator<Integer>() {
int nextIndex = 0;
public boolean hasNext() {
return nextIndex < string.length();
}
public Integer next() {
int result = string.codePointAt(nextIndex);
nextIndex += Character.charCount(result);
return result;
}
public void remove() {
throw new UnsupportedOperationException();
}
};
}
};
}
Upvotes: 1