Tianxiang Xiong
Tianxiang Xiong

Reputation: 4101

Oracle SUBSTR results in longer column?

It seems that SUBSTR can result in a longer column? What's going on here?

CREATE TABLE t (
    long_field VARCHAR2(1024)
);
SELECT
    column_name,
    data_length
FROM
    all_tab_cols
WHERE
    table_name = 'T';
COLUMN_NAME DATA_LENGTH
LONG_FIELD 1024
INSERT INTO t ( long_field ) VALUES ( 'abcdef' );

CREATE TABLE t2
    AS
        SELECT
            substr(long_field, instr(long_field, 'c') + 1) short_field
        FROM
            t;
SELECT
    column_name,
    data_length
FROM
    all_tab_cols
WHERE
    table_name = 'T2';
COLUMN_NAME DATA_LENGTH
SHORT_FIELD 4096

I'm on Oracle 18:

select BANNER
from V$VERSION;
BANNER
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production

Upvotes: 1

Views: 56

Answers (2)

Alex Poole
Alex Poole

Reputation: 191560

You seem to be seeing a difference from the effects of character length semantics.

To avoid confusion with the maximum size allowed for the data type, I'm going to use 100 instead of 1024. If nls_char_semantics is set to byte then your first table is created with the column as 100 byte and that means both data_size and char_length are 100:

alter session set nls_length_semantics=byte
CREATE TABLE t (
    long_field VARCHAR2(100)
);
SELECT
    column_name,
    data_length,
    char_length,
    char_col_decl_length,
    char_used
FROM
    all_tab_cols
WHERE
    table_name = 'T';
COLUMN_NAME DATA_LENGTH CHAR_LENGTH CHAR_COL_DECL_LENGTH CHAR_USED
LONG_FIELD 100 100 100 B

When you create the second table using CTAS the data type has to be established as the table is created based on the possible values returned by the query. It isn't going to execute the query and find the longest actual value - but I don't think you're expecting it to, and it wouldn't work for empty tables if it did that. Instead it works out the longest possible result, as MT0 said, which in this case is 100.

But that is 100 characters. Based just on the column expression there is no way to know that is really limited to 100 bytes (by the base table definition), so the CTAS has to treat those 100 characters like any others; and as it's using byte semantics and a multibyte character set it has to allow for up to 400 bytes when it defines the new column.

And that means your second table is created with short_field defined as 400 byte:

CREATE TABLE t2
    AS
        SELECT
            substr(long_field, instr(long_field, 'c') + 1) short_field
        FROM
            t;
SELECT
    column_name,
    data_length,
    char_length,
    char_col_decl_length,
    char_used
FROM
    all_tab_cols
WHERE
    table_name = 'T2';
COLUMN_NAME DATA_LENGTH CHAR_LENGTH CHAR_COL_DECL_LENGTH CHAR_USED
SHORT_FIELD 400 400 400 B

fiddle

If you were using char semantics it would look more consistent:

alter session set nls_length_semantics=char
CREATE TABLE t (
    long_field VARCHAR2(100)
);
SELECT
    column_name,
    data_length,
    char_length,
    char_col_decl_length,
    char_used
FROM
    all_tab_cols
WHERE
    table_name = 'T';
COLUMN_NAME DATA_LENGTH CHAR_LENGTH CHAR_COL_DECL_LENGTH CHAR_USED
LONG_FIELD 400 100 400 C

Now char_length is still 100, but data_length is already 400, to allow for 100 multibyte characters.

The CTAS column expression still works out the maximum size as 100 characters, but now as it's using char semantics the second table is created with short_field defined as 100 char, so it looks exactly like the original:

CREATE TABLE t2
    AS
        SELECT
            substr(long_field, instr(long_field, 'c') + 1) short_field
        FROM
            t;
SELECT
    column_name,
    data_length,
    char_length,
    char_col_decl_length,
    char_used
FROM
    all_tab_cols
WHERE
    table_name = 'T2';
COLUMN_NAME DATA_LENGTH CHAR_LENGTH CHAR_COL_DECL_LENGTH CHAR_USED
SHORT_FIELD 400 100 400 C

fiddle

Upvotes: 1

MT0
MT0

Reputation: 168623

SUBSTR returns a VARCHAR2(4000) as that is the maximum length sub-string that could be returned from the function.

Just because your function substr(long_field, instr(long_field, 'c') + 1) returns 'def' doesn't mean that it changes the data-type to have a data-length that is the bounded by the data (i.e. VARCHAR2(3)) as SUBSTR could still (for other data) return a 4000 byte string.

If you want a lower data length then use CAST:

CREATE TABLE t3 AS
  SELECT CAST(
           substr(long_field, instr(long_field, 'c') + 1)
           AS VARCHAR2(10)
         ) short_field
  FROM   t;

Then the data dictionary will contain:

TABLE_NAME COLUMN_NAME DATA_LENGTH
T3 SHORT_FIELD 10
T2 SHORT_FIELD 4000

fiddle

Upvotes: 1

Related Questions