Kotlin #toCharArray giving wrong characters

Question

I'm trying to loop through characters of a string. While doing so I found out that the #toCharArray function doesn't split special characters correctly. Here is my testing code:

val text = "\uD835\uDC9C"
text.toCharArray().forEach {
  println(it)
}

Is giving me the following response

?
?

So it seems like that it thinks that \uD835\uDC9C are 2 separate characters. But it should return only a single element 𝒜.

Does anyone know how to get the correct character out of it?

Sweeper · Accepted Answer

Unfortunately, a Kotlin Char is 16-bit, and so characters outside of the basic multilingual plane needs to be represented with 2 Chars (surrogate pairs). One Char is not enough.

If you want to loop through all the Unicode codepoints in the string, use codePoints, which gives you an IntStream:

text.codePoints().forEachOrdered { 
    println(it)
}

This will print one line containing the integral code point of "𝒜", as expected.

If you want to actually see "𝒜", you will need to convert the integer code point to a String. Recall that it can't be put into a Char.

text.codePoints().forEachOrdered {
    println(String(intArrayOf(it), 0, 1))
    // or println(Character.toString(it)) if you don't mind using java.lang.Character
}

Kotlin #toCharArray giving wrong characters

Answers (2)

Related Questions