Reputation: 3
length(column)
and lengthb(column)
are returning the same length in Oracle, even multibyte characters included in values. While checking lengthb
by copy paste, multibyte column value returns a greater value.
SELECT column1,
Length(column1) AS length_C,
Lengthb(column1) AS length_B,
Lengthb('100749 ¬ 100749 ¬ ') AS bytelength
FROM db.sample
+-------------------------------------------------------+
| column1 | length_C |length_B |bytelength |
+-------------------------------------------------------+
|100749 ¬ 100749 ¬ | 17 | 17 | 19 |
+-------------------------------------------------------+
Upvotes: 0
Views: 1635
Reputation: 8518
Knowing that both belong to the same family of LENGTH functions:
I am going to show you an example and for that I would be using a database with characterset AL32UTF8. UTF-8 is the most popular type of Unicode encoding. It uses one byte for standard English letters and symbols, two bytes for additional Latin and Middle Eastern characters, and three bytes for Asian characters. Additional characters can be represented using four bytes. UTF-8 is backwards compatible with ASCII, since the first 128 characters are mapped to the same values.
SQL> select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET'
2 ;
VALUE
--------------------------------------------------------------------------------
AL32UTF8
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c1) , lengthb(c1) , lengthc ( c1 ) from t ;
LENGTH(C1) LENGTHB(C1) LENGTHC(C1)
---------- ----------- -----------
11 11 11
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c2) , lengthb(c2), lengthc(c2) from t ;
LENGTH(C2) LENGTHB(C2) LENGTHC(C2)
---------- ----------- -----------
17 45 17
In the example, C1 contains only English normal letters, that is why the three functions return the same. In the case of c2, you might see the difference between characters, bytes and unicode.
In those cases I always recommend to use DUMP(). It is the best way to understand the internal representation of these characters.
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c1) as length_characters , dump(c1) as dump from t ;
LENGTH_CHARACTERS DUMP
----------------- -------------------------------------------------------
11 Typ=96 Len=11: 97,98,99,100,101,102,103,104,105,106,107
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c2) as length_characters , dump(c2) as dump from t ;
LENGTH_CHARACTERS
-----------------
DUMP
--------------------------------------------------------------------------------
17
Typ=96 Len=45: 239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,1
91,189,239,191,189,239,191,189,35,43,35,239,191,189,239,191,189,239,191,189,239,
191,189,239,191,189,239,191,189
In your case, you made a mistake as you use twice lengthb
( I guess one should have been length ). Check the internal representation of your string:
SQL> select dump('100749 ¬ 100749 ¬ ',1016) from dual ;
DUMP('100749??100749??',1016)
------------------------------------------------------------------------------------------------------------------------
Typ=96 Len=28 CharacterSet=AL32UTF8: 31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20,31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20
SQL>
Upvotes: 2