Reputation: 2311
How can I declare a Char
range in Kotlin that encloses a four-byte range?
private val CJK_IDEOGRAPHS_EXT_A = '\u3400' .. '\u4DBF' // OK
private val CJK_IDEOGRAPHS_EXT_B = '\u20000' .. '\u2A6DF' // doesn't compile
I tried the following hack, but I get the warning, "this cast can never succeed":
private val CJK_IDEOGRAPHS_EXT_B: CharRange = 0x20000 as Char .. 0x2A6DF as Char
Basically I want to implement a function like this:
fun isCJK(c: Char): Boolean {
return c in CJK_RADICALS ||
c in CJK_SYMBOLS ||
c in CJK_STROKES ||
c in CJK_ENCLOSED ||
c in CJK_IDEOGRAPHS ||
c in CJK_COMPAT ||
c in CJK_COMPAT_IDEOGRAPHS ||
c in CJK_COMPAT_FORMS ||
c in CJK_IDEOGRAPHS_EXT_A
// EXT_B not working
// EXT_C not working
// EXT_D not working
// EXT_E not working
// EXT_F not working
}
I'm using Kotlin under Android.
Upvotes: 2
Views: 1468
Reputation: 170839
On JVM, Char
is a 16 bit code unit and so the maximum code point it can represent is 0xFFFF; the ranges you mention are represented by surrogate pairs. So your function should take a String
instead, e.g.
private val CJK_IDEOGRAPHS_EXT_B: IntRange = 0x20000 .. 0x2A6DF
...
fun isCJK(s: String): Boolean {
if (s.codePointCount(0, s.length) > 1)
throw new IllegalArgumentException("String \"$s\" contains more than 1 codepoint")
val c = s.codePointAt(0)
return c in CJK_RADICALS ||
c in CJK_SYMBOLS ||
c in CJK_STROKES ||
c in CJK_ENCLOSED ||
c in CJK_IDEOGRAPHS ||
c in CJK_COMPAT ||
c in CJK_COMPAT_IDEOGRAPHS ||
c in CJK_COMPAT_FORMS ||
c in CJK_IDEOGRAPHS_EXT_A ||
c in CJK_IDEOGRAPHS_EXT_B || ...
}
Java 9 has a much more convenient IntStream codePoints()
method, but it doesn't seem to be available on Android.
Upvotes: 2