calvinfly
calvinfly

Reputation: 453

How to truncate String contain emoji by byte size

I would like to limit String with UTF-8 charset size by 30 bytes, and I found a solution this

so I create a method base on this

public static String truncateTextByByteLimit(String message, int byteLimit) {
    String result = "";
    try {
        Charset utf8Charset = Charset.forName("UTF-8");
        CharsetDecoder cd = utf8Charset.newDecoder();
        byte[] utf8Bytes = message.getBytes(utf8Charset);
        System.out.println("check message: " + message + " /length: " +message.length()+ " //byte length: " + utf8Bytes.length + "/limit: " + byteLimit + " /codePoint: " +message.codePointCount(0, message.length()));
        ByteBuffer bb = ByteBuffer.wrap(utf8Bytes, 0, byteLimit);
        CharBuffer cb = CharBuffer.allocate(byteLimit);
        // Ignore an incomplete character
        cd.onMalformedInput(CodingErrorAction.IGNORE);
        cd.decode(bb, cb, true);
        cd.flush(cb);
        result = new String(cb.array(), 0, cb.position());
        if (result.length()<=0) {
            return truncateTextByByteLimit(message, (byteLimit+1));
        } else {
            return result;
        }
    } catch (Exception e) {
        e.printStackTrace();

        return message;
    }
}

Problem is while I test String with emoji like below: System.out.println(truncateTextByByteLimit("let's \uD83D\uDE09", 30));

it shows error

java.lang.IndexOutOfBoundsException
at java.nio.ByteBuffer.wrap(ByteBuffer.java:371)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

and my debug message shows check message: let's 😉 /length: 8 //byte length: 10/limit: 30 /codePoint: 7

When I tested with same message and byteLimit less than or equal to 10, it works without error...

So I don't understand why it shows java.lang.IndexOutOfBoundsException

Upvotes: 1

Views: 990

Answers (1)

Makoto
Makoto

Reputation: 106460

ByteBuffer#wrap has a limitation on what's allowed to be the length.

The length of the subarray to be used; must be non-negative and no larger than array.length - offset. The new buffer's limit will be set to offset + length.

To remedy that, you need to take the lesser of the two lengths - either it's going to be your absolute max byteLimit, or it's going to be the size of the utf8Bytes array.

ByteBuffer.wrap(utf8Bytes, 0, Math.min(utf8Bytes.length, byteLimit));

Upvotes: 1

Related Questions