Humberto
Humberto

Reputation: 7199

Byte-based length of CLR string

Short version of the question: Is there a way of counting how many bytes will it take to store the characters of a string in a VARCHAR(n) column in an Oracle database?

Longer version: The following Oracle SQL script will fail at the third statement. It will try to insert 10 characters in a VARCHAR(10) column; however, one of these characters is the A with an acute accent.

CREATE TABLE TESTTABLE (NAME VARCHAR(10) NULL);

INSERT INTO TESTTABLE (NAME) VALUES ('abcdefghij');
--1 rows inserted.

INSERT INTO TESTTABLE (NAME) VALUES ('ábcdefghij');
--ORA-12899: value too large for column "ACME"."TESTTABLE"."NAME" (actual: 11, maximum: 10)

My C# application stores strings in an Oracle database and I can't just change the column type to NVARCHAR2(10), which would work. The application is expected to trim any larger string to a limit of 10 characters, so Oracle won't complain about its length. But a String.Length-based trim is a very naif strategy: it would blindly leave "ábcdefghij" intact with all its 10 CLR characters.

How can I detect that 'á' will take 2 bytes in the database row so I can trim the string to "ábcdefghi" before issuing the INSERT statement?

EDIT: this question is similar to Best way to shorten UTF8 string based on byte length

Upvotes: 1

Views: 514

Answers (4)

Justin Cave
Justin Cave

Reputation: 231661

By default, VARCHAR2(10) allocates 10 bytes of storage which may or may not equate to 10 characters depending on the database character set and the data. You can, however, declare the column so that it will always store 10 characters.

When you are using a variable-width character set, you almost always want to declare the column using character-length semantics VARCHAR2(10 CHAR). Then you can always store 10 characters whether you happen to have some characters that require more than 1 byte of storage.

If you're stuck declaring columns using byte-length semantics for some reason, you could use the LENGTHB or VSIZE function to return the length of the string in bytes.

select lengthb('ábcdefghij'), vsize('ábcdefghij')
  from dual;

will return 11 for both columns of the result.

Upvotes: 4

Egor Skriptunoff
Egor Skriptunoff

Reputation: 23727

Oracle function length(string) returns number of characters, lengthb(string) returns number of bytes.

Upvotes: 5

Rafael
Rafael

Reputation: 2817

You can get the lenght in bytes of the string like this:

UTF8Encoding Encoding = new UTF8Encoding();
byte[] UTF8String = Encoding.GetBytes("ábcdefghij");
int StringLenght = UTF8String.Length

In fact, for your example, it returns 11.

Upvotes: 1

Billy ONeal
Billy ONeal

Reputation: 106530

This depends on the encoding you have Oracle set to use. You convert the string instance into a byte array using the corresponding System.Text.Encoding instance, such as System.Text.Encoding.UTF8. (You're looking for the "GetBytes" method)

Upvotes: 4

Related Questions