dan
dan

Reputation: 91

How many bytes has a string in utf-8?

i have this code in my program in java :

 public static int NumOfStringByte(String str) throws UnsupportedEncodingException{
        return str.getBytes("UTF-8").length+2;
    }

... is this correct? how can i calculate the number of bytes of a string?

Upvotes: 1

Views: 3658

Answers (1)

Michael Aaron Safyan
Michael Aaron Safyan

Reputation: 95459

In Java, calling getBytes('UTF-8') already gives you exactly the bytes in the UTF-8 encoding format, so you should simply return the length of that byte array. The only reason to add to that number is if you are adding some additional bytes (such as for NUL-termination or to include a byte-order mark); however, if you were to do that, you should choose a clearer function name.

Note, however, that the length of the UTF-8 encoding format is NOT the same as the String's footprint in memory. Java stores its strings in memory using the UTF-16 encoding format. The number of bytes actually used to store the string is str.length() * 2 (basically, str.length() gives you the number of char objects in the underlying buffer, and each charis 2 bytes).

Upvotes: 1

Related Questions