Marian
Marian

Reputation: 44

Kotlin #toCharArray giving wrong characters

I'm trying to loop through characters of a string. While doing so I found out that the #toCharArray function doesn't split special characters correctly. Here is my testing code:

val text = "\uD835\uDC9C"
text.toCharArray().forEach {
  println(it)
}

Is giving me the following response

?
?

So it seems like that it thinks that \uD835\uDC9C are 2 separate characters. But it should return only a single element 𝒜.

Does anyone know how to get the correct character out of it?

Upvotes: 0

Views: 389

Answers (2)

konstantin durant
konstantin durant

Reputation: 148

That's a surrogate pair. They can't be stored in a single character. What's the problem with printing it as a string?

val text = "\uD835\uDC9C"
println(text)

This prints the character you want. And if you have multiple characters in your string you can just split the string by every second backslash and create a list.

Upvotes: 0

Sweeper
Sweeper

Reputation: 270770

Unfortunately, a Kotlin Char is 16-bit, and so characters outside of the basic multilingual plane needs to be represented with 2 Chars (surrogate pairs). One Char is not enough.

If you want to loop through all the Unicode codepoints in the string, use codePoints, which gives you an IntStream:

text.codePoints().forEachOrdered { 
    println(it)
}

This will print one line containing the integral code point of "𝒜", as expected.

If you want to actually see "𝒜", you will need to convert the integer code point to a String. Recall that it can't be put into a Char.

text.codePoints().forEachOrdered {
    println(String(intArrayOf(it), 0, 1))
    // or println(Character.toString(it)) if you don't mind using java.lang.Character
}

Upvotes: 3

Related Questions