Reputation: 4101
It seems that SUBSTR
can result in a longer column? What's going on here?
CREATE TABLE t (
long_field VARCHAR2(1024)
);
SELECT
column_name,
data_length
FROM
all_tab_cols
WHERE
table_name = 'T';
COLUMN_NAME | DATA_LENGTH |
---|---|
LONG_FIELD | 1024 |
INSERT INTO t ( long_field ) VALUES ( 'abcdef' );
CREATE TABLE t2
AS
SELECT
substr(long_field, instr(long_field, 'c') + 1) short_field
FROM
t;
SELECT
column_name,
data_length
FROM
all_tab_cols
WHERE
table_name = 'T2';
COLUMN_NAME | DATA_LENGTH |
---|---|
SHORT_FIELD | 4096 |
I'm on Oracle 18:
select BANNER
from V$VERSION;
BANNER |
---|
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production |
Upvotes: 1
Views: 56
Reputation: 191560
You seem to be seeing a difference from the effects of character length semantics.
To avoid confusion with the maximum size allowed for the data type, I'm going to use 100 instead of 1024. If nls_char_semantics
is set to byte
then your first table is created with the column as 100 byte
and that means both data_size
and char_length
are 100:
alter session set nls_length_semantics=byte
CREATE TABLE t (
long_field VARCHAR2(100)
);
SELECT
column_name,
data_length,
char_length,
char_col_decl_length,
char_used
FROM
all_tab_cols
WHERE
table_name = 'T';
COLUMN_NAME | DATA_LENGTH | CHAR_LENGTH | CHAR_COL_DECL_LENGTH | CHAR_USED |
---|---|---|---|---|
LONG_FIELD | 100 | 100 | 100 | B |
When you create the second table using CTAS the data type has to be established as the table is created based on the possible values returned by the query. It isn't going to execute the query and find the longest actual value - but I don't think you're expecting it to, and it wouldn't work for empty tables if it did that. Instead it works out the longest possible result, as MT0 said, which in this case is 100.
But that is 100 characters. Based just on the column expression there is no way to know that is really limited to 100 bytes (by the base table definition), so the CTAS has to treat those 100 characters like any others; and as it's using byte semantics and a multibyte character set it has to allow for up to 400 bytes when it defines the new column.
And that means your second table is created with short_field
defined as 400 byte
:
CREATE TABLE t2
AS
SELECT
substr(long_field, instr(long_field, 'c') + 1) short_field
FROM
t;
SELECT
column_name,
data_length,
char_length,
char_col_decl_length,
char_used
FROM
all_tab_cols
WHERE
table_name = 'T2';
COLUMN_NAME | DATA_LENGTH | CHAR_LENGTH | CHAR_COL_DECL_LENGTH | CHAR_USED |
---|---|---|---|---|
SHORT_FIELD | 400 | 400 | 400 | B |
If you were using char semantics it would look more consistent:
alter session set nls_length_semantics=char
CREATE TABLE t (
long_field VARCHAR2(100)
);
SELECT
column_name,
data_length,
char_length,
char_col_decl_length,
char_used
FROM
all_tab_cols
WHERE
table_name = 'T';
COLUMN_NAME | DATA_LENGTH | CHAR_LENGTH | CHAR_COL_DECL_LENGTH | CHAR_USED |
---|---|---|---|---|
LONG_FIELD | 400 | 100 | 400 | C |
Now char_length
is still 100, but data_length
is already 400, to allow for 100 multibyte characters.
The CTAS column expression still works out the maximum size as 100 characters, but now as it's using char semantics the second table is created with short_field defined as 100 char, so it looks exactly like the original:
CREATE TABLE t2
AS
SELECT
substr(long_field, instr(long_field, 'c') + 1) short_field
FROM
t;
SELECT
column_name,
data_length,
char_length,
char_col_decl_length,
char_used
FROM
all_tab_cols
WHERE
table_name = 'T2';
COLUMN_NAME | DATA_LENGTH | CHAR_LENGTH | CHAR_COL_DECL_LENGTH | CHAR_USED |
---|---|---|---|---|
SHORT_FIELD | 400 | 100 | 400 | C |
Upvotes: 1
Reputation: 168623
SUBSTR
returns a VARCHAR2(4000)
as that is the maximum length sub-string that could be returned from the function.
Just because your function substr(long_field, instr(long_field, 'c') + 1)
returns 'def'
doesn't mean that it changes the data-type to have a data-length that is the bounded by the data (i.e. VARCHAR2(3)
) as SUBSTR
could still (for other data) return a 4000 byte string.
If you want a lower data length then use CAST
:
CREATE TABLE t3 AS
SELECT CAST(
substr(long_field, instr(long_field, 'c') + 1)
AS VARCHAR2(10)
) short_field
FROM t;
Then the data dictionary will contain:
TABLE_NAME | COLUMN_NAME | DATA_LENGTH |
---|---|---|
T3 | SHORT_FIELD | 10 |
T2 | SHORT_FIELD | 4000 |
Upvotes: 1