Reputation: 44
I'm trying to loop through characters of a string. While doing so I found out that the #toCharArray
function doesn't split special characters correctly. Here is my testing code:
val text = "\uD835\uDC9C"
text.toCharArray().forEach {
println(it)
}
Is giving me the following response
?
?
So it seems like that it thinks that \uD835\uDC9C
are 2 separate characters. But it should return only a single element 𝒜
.
Does anyone know how to get the correct character out of it?
Upvotes: 0
Views: 389
Reputation: 148
That's a surrogate pair. They can't be stored in a single character. What's the problem with printing it as a string?
val text = "\uD835\uDC9C"
println(text)
This prints the character you want. And if you have multiple characters in your string you can just split the string by every second backslash and create a list.
Upvotes: 0
Reputation: 270770
Unfortunately, a Kotlin Char
is 16-bit, and so characters outside of the basic multilingual plane needs to be represented with 2 Char
s (surrogate pairs). One Char
is not enough.
If you want to loop through all the Unicode codepoints in the string, use codePoints
, which gives you an IntStream
:
text.codePoints().forEachOrdered {
println(it)
}
This will print one line containing the integral code point of "𝒜", as expected.
If you want to actually see "𝒜", you will need to convert the integer code point to a String
. Recall that it can't be put into a Char
.
text.codePoints().forEachOrdered {
println(String(intArrayOf(it), 0, 1))
// or println(Character.toString(it)) if you don't mind using java.lang.Character
}
Upvotes: 3