Reputation: 748
I have a list of oracle datatypes (hundreds of them), which I would like to programmatically map to SQLAlchemy Datatypes.
Some example oracle datatypes (among many others) are:
XMLTYPE
VARCHAR2
UROWID
URITYPE
UNDEFINED
TIMESTAMP(9)
I am querying data from oracle into a pandas dataframe, and then using sql alchemy to load that data to another system. However I need to specify what the target datatypes should be in the target system. (Just using the dataframe defaults doesn't work due to the ETL process being batched; the first batch of values can "mislead" the datatype.)
Here is a list of SQLAlchemy datatypes, and I need to map the oracle datatypes to the sqlalchemy ones. Is there an elegant way to do so other than manually constructing such a dictionary?
EDIT: The specific example datatypes I gave were not important. I would rather programmatically and dynamically assign an arbitrary column from Oracle to the correct SQLAlchemy datatype. The datatypes are the response to
SELECT
DISTINCT data_type
FROM
all_tab_columns;
which returned upwards of 150 values. So I'd really prefer to avoid manually looking up documentation on each.
Upvotes: 0
Views: 1926
Reputation: 167972
The XMLTYPE
data type has getStringVal
and getClobVal
methods (depending on length of the content) you can use to extract the XML content which would just be a string value and could use sqlalchemy.types.Text
(after calling the getClobVal
method).
VARCHAR2
is just a variable-length string. Oracle does not have a VARCHAR
data type, this is its equivalent (and VARCHAR
is currently a synonym of VARCHAR2
). The equivalent is sqlalchemy.types.String
.
UROWID
is documented to be:
Each row in a database has an address. However, the rows of some tables have addresses that are not physical or permanent or were not generated by Oracle Database. For example, the row addresses of index-organized tables are stored in index leaves, which can move. Rowids of foreign tables (such as DB2 tables accessed through a gateway) are not standard Oracle rowids.
Oracle uses universal rowids (urowids) to store the addresses of index-organized and foreign tables. Index-organized tables have logical urowids and foreign tables have foreign urowids. Both types of urowid are stored in the
ROWID
pseudocolumn (as are the physical rowids of heap-organized tables).
It makes no sense to persist a UROWID
outside the database as the data it represents may not be permanent and may move and, even if it does not move, it just represents a memory-location and not any actual data.
URITYPE
represents a URI
to another resource and you can just use its getURL
method to get the URI it represents, which should just be a string value. The equivalent data-type would be sqlalchemy.types.String
or sqlalchemy.types.Text
(after calling the getURL
method to get the string representation).
UNDEFINED
is not a data type. You should check if someone has created a user-defined type called UNDEFINED
.
TIMESTAMP(9)
is just a TIMESTAMP
with a precision of 9
decimal seconds digits. From the documentation, it appears you want sqlalchemy.dialects.oracle.DATE
.
The datatypes are the response to
SELECT DISTINCT data_type FROM all_tab_columns;
Please do not rely on this list as this includes all the data types used in the tables of the *SYS
schemas and you should not touch most of these tables as changing them can have unforeseen consequences (including making your database unusable).
Most of those data types are going to be private data types used within the internal workings of the database; you can determine those as they commonly have a $
in the type name. The next most common grouping is going to be the SDO geometry data types and these all have the SDO_
prefix.
So if your query then becomes:
SELECT CASE
WHEN data_type LIKE '%$%' THEN 'Private Type'
WHEN data_type LIKE 'SDO_%' THEN 'SDO Geometry Type'
ELSE data_type
END AS data_type,
COUNT( DISTINCT data_type ) AS num_instances
FROM all_tab_columns
GROUP BY
CASE
WHEN data_type LIKE '%$%' THEN 'Private Type'
WHEN data_type LIKE 'SDO_%' THEN 'SDO Geometry Type'
ELSE data_type
END
ORDER BY num_instances DESC, data_type ASC;
Then, on a db<>fiddle you get this output, with zero user-generated tables (so only for the system generated tables):
DATA_TYPE | NUM_INSTANCES :--------------------------- | ------------: Private Type | 130 SDO Geometry Type | 5 ANYDATA | 1 BINARY_DOUBLE | 1 BLOB | 1 CHAR | 1 CLOB | 1 DATE | 1 DS_VARRAY_4_CLOB | 1 FLOAT | 1 HSBLKNAMLST | 1 HSBLKVALARY | 1 INTERVAL DAY(3) TO SECOND(0) | 1 INTERVAL DAY(3) TO SECOND(2) | 1 INTERVAL DAY(9) TO SECOND(6) | 1 LONG | 1 LONG RAW | 1 NUMBER | 1 NVARCHAR2 | 1 RAW | 1 ROWID | 1 TIMESTAMP(0) | 1 TIMESTAMP(3) | 1 TIMESTAMP(6) | 1 TIMESTAMP(6) WITH TIME ZONE | 1 TIMESTAMP(9) | 1 UNDEFINED | 1 VARCHAR2 | 1 XMLTYPE | 1
And 130 of the 162 data types are private and another 5 are SDO geometry types.
Investigating the other "interesting types":
SELECT owner, table_name, column_name, data_type
FROM ALL_TAB_COLUMNS
WHERE data_type IN ( 'UNDEFINED', 'HSBLKVALARY', 'HSBLKNAMLST', 'ROWID', 'ANYDATA', 'DS_VARRAY_4_CLOB' )
ORDER BY owner, table_name, data_type
Outputs:
OWNER | TABLE_NAME | COLUMN_NAME | DATA_TYPE :----- | :---------------------------- | :-------------------- | :--------------- CTXSYS | CTX_USER_PENDING | PND_ROWID | ROWID CTXSYS | DRV$PENDING | PND_ROWID | ROWID CTXSYS | DRV$UNINDEXED | UNX_ROWID | ROWID CTXSYS | DRV$UNINDEXED2 | UNX_ROWID | ROWID CTXSYS | DRV$WAITING | WTG_ROWID | ROWID MDSYS | SDO_GR_MOSAIC_0 | RID | ROWID MDSYS | SDO_GR_MOSAIC_1 | RID | ROWID MDSYS | SDO_GR_MOSAIC_2 | RID | ROWID SYS | ALL_SCHEDULER_JOB_ARGS | ANYDATA_VALUE | ANYDATA SYS | ALL_SCHEDULER_PROGRAM_ARGS | DEFAULT_ANYDATA_VALUE | ANYDATA SYS | ALL_SQLSET_BINDS | VALUE | ANYDATA SYS | ALL_STREAMS_MESSAGE_CONSUMERS | NOTIFICATION_CONTEXT | ANYDATA SYS | ALL_SUMDELTA | HIGHROWID | ROWID SYS | ALL_SUMDELTA | LOWROWID | ROWID SYS | HS$_PARALLEL_METADATA | PARTITION_COL_TYPES | HSBLKNAMLST SYS | HS$_PARALLEL_METADATA | PARTITION_COL_NAMES | HSBLKNAMLST SYS | HS_PARALLEL_METADATA | PARTITION_COL_NAMES | HSBLKNAMLST SYS | HS_PARALLEL_METADATA | PARTITION_COL_TYPES | HSBLKNAMLST SYS | HS_PARALLEL_PARTITION_DATA | PARTITION_COL_TYPES | HSBLKNAMLST SYS | HS_PARALLEL_PARTITION_DATA | PARTITION_COL_NAMES | HSBLKNAMLST SYS | HS_PARALLEL_PARTITION_DATA | HIGH_VALUE | HSBLKVALARY SYS | HS_PARALLEL_PARTITION_DATA | LOW_VALUE | HSBLKVALARY SYS | ORA_KGLR7_IDL_SB4 | PIECE | UNDEFINED SYS | ORA_KGLR7_IDL_UB2 | PIECE | UNDEFINED SYS | USER_COMPARISON_ROW_DIF | LOCAL_ROWID | ROWID SYS | USER_COMPARISON_ROW_DIF | REMOTE_ROWID | ROWID SYS | USER_PARALLEL_EXECUTE_CHUNKS | END_ROWID | ROWID SYS | USER_PARALLEL_EXECUTE_CHUNKS | START_ROWID | ROWID SYS | USER_SCHEDULER_JOB_ARGS | ANYDATA_VALUE | ANYDATA SYS | USER_SCHEDULER_PROGRAM_ARGS | DEFAULT_ANYDATA_VALUE | ANYDATA SYS | USER_SQLSET_BINDS | VALUE | ANYDATA SYS | USER_SQLTUNE_BINDS | VALUE | ANYDATA SYS | USER_SR_STLOG_EXCEPTIONS | BAD_ROWID | ROWID SYS | USER_SUBSCR_REGISTRATIONS | ANY_CONTEXT | ANYDATA SYS | _USER_COMPARISON_ROW_DIF | RMT_ROWID | ROWID SYS | _USER_COMPARISON_ROW_DIF | LOC_ROWID | ROWID SYS | _user_stat_varray | CL1 | DS_VARRAY_4_CLOB XDB | XDB$ROOT_INFO_V | RESOURCE_ROOT | ROWID
These are all *SYS
tables or private tables and you are almost certainly never going to want to interact with them directly.
The UNDEFINED
type is interesting as it does appear to be undefined:
SELECT owner, type_name
FROM ALL_TYPES
WHERE TYPE_NAME = 'UNDEFINED';
Returns zero rows and:
CREATE TABLE TABLE_NAME ( id UNDEFINED );
Raises the exception ORA-00902: invalid datatype
.
What would be a better determination of the types you are using? Just look in the schemas you have created:
SELECT DISTINCT owner, data_type
FROM all_tab_columns
WHERE owner IN ( 'USER1', 'USER2', 'USER3' )
Then you know that those data types are the ones in use within the tables your users have created. You should find that, unless you are doing something esoteric, most (if not all) of the types you are using are going to be handled natively by SQLAlchemy.
db<>fiddle here
Upvotes: 1