Reputation: 359
The code:
val plainText = "plainText"
val plainTextWithEmoji = "plainText🥰🥰🥰"
println("plainText=$plainText, length=${plainText.length}")
println("plainTextWithEmoji=$plainText, length=${plainTextWithEmoji.length}")
// Output:
// plainText=plainText, length=9
// plainTextWithEmoji=plainText🥰🥰🥰, length=15
This code imply that emoji character's length is 2, not 1.
When I want to remove the last character's:
If I call plainText.subSequence(0, plainTextWithEmoji.length - 1)
, the result is wrong, because emoji character length is more than 1.
To call subSequence and get the correct result, do this: plainText.subSequence(0, plainTextWithEmoji.length - 2)
But in general, We can not know if the last character's length is 1. When we want to remove the last character, simply call charSequence.subSequence(0, charSequence.length - 1)
will return a wrong result.
So, it is any way to remove last grapheme of CharSequence? Thx!
Upvotes: 2
Views: 137
Reputation: 359
Finally, I find the solution inspired by this post. Since UTF-8 is variable length, to call CharSequence.subSequence
and get correct result, we can get every grapheme's start index in this sentence by magic BreakIterator
:
fun CharSequence.removeLast(): CharSequence {
val graphemeStartIndexes = computeGraphemesStartIndexes(this)
return this.subSequence(0, graphemeStartIndexes.last())
}
private fun computeGraphemesStartIndexes(sequence: CharSequence): List<Int> {
val breakIterator = BreakIterator.getCharacterInstance()
breakIterator.setText(sequence.toString())
val graphemesStartIndexes = mutableListOf<Int>()
val start = breakIterator.first()
graphemesStartIndexes.add(start)
while (breakIterator.next() != BreakIterator.DONE) {
graphemesStartIndexes.add(breakIterator.current())
}
return graphemesStartIndexes.apply { removeAt(size - 1) }
}
Example:
val plainTextEmojiSequence = "Hello😊"
val plainTextOnlySequence = "Hi~!"
println(plainTextEmojiSequence.removeLast()) // "Hello"
println(plainTextOnlySequence.removeLast()) // "Hi~"
Upvotes: 2