pankaj gambhir
pankaj gambhir

Reputation: 61

character taking 6 bytes

We are trying to save the below string which is actually a name in db, we make some api call and we get this name:

株式会社エス・ダブリュー・コミュニケーションズ

While saving through our code (as in servlet - hibernate - database), we get an error:

Caused by: java.sql.BatchUpdateException: ORA-12899: value too large for column "NAME_ON_ACCOUNT" (actual: 138, maximum: 100)

this is 23 characters but looks like it's taking 6 bytes per character, that would only make it 138.

Below code gives me 69:

byte[] utf8Bytes = string.getBytes("UTF-8");    
System.out.println(utf8Bytes.length);

And this gives me 92:

byte[] utf8Bytes = string.getBytes("UTF-32");
System.out.println(utf8Bytes.length);

I will surely check NLS_CHARACTERSET and see the IO classes but have you ever seen a character taking 6 bytes? Any help will be much appreciated.

Upvotes: 6

Views: 585

Answers (2)

Esailija
Esailija

Reputation: 140230

You probably literally have:

\u682a\u5f0f\u4f1a\u793e\u30a8\u30b9\u30fb\u30c0\u30d6\u30ea\u30e5\u30fc\u30fb\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u30ba

See:

"\u682a\u5f0f\u4f1a\u793e\u30a8\u30b9\u30fb\u30c0\u30d6\u30ea\u30e5\u30fc\u30fb\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u30ba".length();
//23, or 69 UTF-8 bytes

Vs:

"\\u682a\\u5f0f\\u4f1a\\u793e\\u30a8\\u30b9\\u30fb\\u30c0\\u30d6\\u30ea\\u30e5\\u30fc\\u30fb\\u30b3\\u30df\\u30e5\\u30cb\\u30b1\\u30fc\\u30b7\\u30e7\\u30f3\\u30ba".length();
//138, or 138 UTF-8 bytes

Upvotes: 0

Zdenek
Zdenek

Reputation: 710

It probably holds HTML entities in a string. Like 燃 or possibly the URL style, %8C%9A. Or maybe UTF7, like [Ay76b. (I made up those values, but your actual ones will be similar). It is always a pain to rely on any framework with character encoding because its authors were likely U.S. or European, both sufficing with simple ANSI where one byte equals one character. If you managed to understand your encoding and converted it to the real UTF8 or even UTF16, it would take up less space in this particular case.

Upvotes: 3

Related Questions