Why do length(column) and lengthb(column) return same length?

Question

length(column) and lengthb(column) are returning the same length in Oracle, even multibyte characters included in values. While checking lengthb by copy paste, multibyte column value returns a greater value.

SELECT column1, 
       Length(column1)              AS length_C, 
       Lengthb(column1)              AS length_B, 
       Lengthb('100749 ¬ 100749 ¬ ') AS bytelength 
FROM   db.sample

+-------------------------------------------------------+
|            column1 | length_C  |length_B  |bytelength |
+-------------------------------------------------------+
|100749 ¬ 100749 ¬   |  17       |   17     |   19      |
+-------------------------------------------------------+

Roberto Hernandez · Accepted Answer

Knowing that both belong to the same family of LENGTH functions:

Length ( of Characters )
Lengthb ( of Bytes )
Lengthc ( Unicode characters, normalazing where possible )

I am going to show you an example and for that I would be using a database with characterset AL32UTF8. UTF-8 is the most popular type of Unicode encoding. It uses one byte for standard English letters and symbols, two bytes for additional Latin and Middle Eastern characters, and three bytes for Asian characters. Additional characters can be represented using four bytes. UTF-8 is backwards compatible with ASCII, since the first 128 characters are mapped to the same values.

SQL> select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET'
  2  ;

VALUE
--------------------------------------------------------------------------------
AL32UTF8

SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
  2  select length(c1) , lengthb(c1) , lengthc ( c1 ) from t ;

LENGTH(C1) LENGTHB(C1) LENGTHC(C1)
---------- ----------- -----------
        11          11          11

SQL>  with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
  2  select length(c2) , lengthb(c2), lengthc(c2) from t ;

LENGTH(C2) LENGTHB(C2) LENGTHC(C2)
---------- ----------- -----------
        17          45          17

In the example, C1 contains only English normal letters, that is why the three functions return the same. In the case of c2, you might see the difference between characters, bytes and unicode.

In those cases I always recommend to use DUMP(). It is the best way to understand the internal representation of these characters.

SQL>  with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
  2  select length(c1) as length_characters , dump(c1) as dump from t ;

LENGTH_CHARACTERS DUMP
----------------- -------------------------------------------------------
               11 Typ=96 Len=11: 97,98,99,100,101,102,103,104,105,106,107

SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
  2  select length(c2) as length_characters , dump(c2) as dump from t ;

LENGTH_CHARACTERS
-----------------
DUMP
--------------------------------------------------------------------------------
               17
Typ=96 Len=45: 239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,1
91,189,239,191,189,239,191,189,35,43,35,239,191,189,239,191,189,239,191,189,239,
191,189,239,191,189,239,191,189

In your case, you made a mistake as you use twice lengthb ( I guess one should have been length ). Check the internal representation of your string:

SQL> select dump('100749 ¬ 100749 ¬ ',1016) from dual ;

DUMP('100749??100749??',1016)
------------------------------------------------------------------------------------------------------------------------
Typ=96 Len=28 CharacterSet=AL32UTF8: 31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20,31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20

SQL>

Why do length(column) and lengthb(column) return same length?

Answers (1)

Related Questions