Nimbocrux
Nimbocrux

Reputation: 519

Oracle SQL loader - load inconsistent date formats in same column

A client site has supplied the following extract file for us to load into our database.

The problem is, for certain rows (the second row for example) the CREATED_DATE and the LAST_UPDATE_DATE are in "dd Mmm YYYY..." date format when the rest the rows (such as the top) are in the format of "YYYY-MM-DD HH24.MI.SSXFF"

PRIMARY_ID  ID VALUE       CREATED_DATE          LAST_UPDATE_DATE
20166267    20834830491    2012-04-30 08:18:00   2012-04-30 08:18:00
20166536    9112           01 Oct 2010 17:27:04  01 Oct 2010 17:27:04

My questions are: Q1. To avoid having to request an extract, can we manipulate these “dd Mmm YYYY...” formatted dates at import time in SQL Loader using the .ctl script? CUrrently my .ctl is

My .ctl file is scripted to import using:

IDENTIFIER_START_DATE   TIMESTAMP "YYYY-MM-DD HH24.MI.SSXFF",
LAST_UPDATE_DATE        TIMESTAMP "YYYY-MM-DD HH24.MI.SSXFF"

Q2. Is simply asking them for a re-extract with all date formats as requested the best practice in situations like this?

Upvotes: 0

Views: 4626

Answers (2)

Egor Skriptunoff
Egor Skriptunoff

Reputation: 23757

Use this .ctl script:

load data
append
into table schema_name.table_name
fields terminated by ';' optionally enclosed by '"'
(
PRIMARY_ID,
ID_VALUE,
CREATED_DATE "to_date(:CREATED_DATE, case when regexp_substr(:CREATED_DATE,'\w+',1,2)=regexp_substr(:CREATED_DATE,'\d+',1,2) then 'YYYY-MM-DD HH24:MI:SS' else 'dd Mon YYYY HH24:MI:SS' end)",
LAST_UPDATE_DATE "to_date(:LAST_UPDATE_DATE, case when regexp_substr(:LAST_UPDATE_DATE,'\w+',1,2)=regexp_substr(:LAST_UPDATE_DATE,'\d+',1,2) then 'YYYY-MM-DD HH24:MI:SS' else 'dd Mon YYYY HH24:MI:SS' end)"
)

Upvotes: 1

Justin Cave
Justin Cave

Reputation: 231781

Whether to request a re-extract of the data depends on a number of factors.

  • Is this a one-time process or an ongoing data feed? It may be perfectly reasonable to try to do your best with a one-time load of data where it is easier to eyeball the outliers. If you are going to manage an ongoing data feed, it generally makes much more sense to agree on a strict standard for the file rather than trying to manually inspect problematic rows.
  • Does the client have an incentive to make your load process simple and repeatable? Or was the client sold on a fixed price to load the data in whatever format they want to provide it? If the client has an incentive to make the load process simple and repeatable, it makes sense for them to invest the time to generate a clean file. If you've sold them a fixed price for whatever work needs to be done to turn the file into coherent data, on the other hand, they probably won't be pleased if you push a lot of that work back on them.
  • Are there rows where the data is ambiguous? For example "01-02-03" could refer to Jan 2, 2003 or Jan 2, 1903 or Feb 3, 2001 or a number of other dates. If there is ambiguity, it makes sense to request a re-extract.

As to how to load the data, while it is possible to do this in a single step, you generally wouldn't want to. It would generally make more sense to load the data into a staging table (or use an external table) where all the columns are declared as VARCHAR2 and then write some ETL logic that transforms the data into the appropriate data types (and logs errors for the data that cannot be converted). For example, if you loaded the data into a staging table where all the columns were defined as VARCHAR2, you could use something like the my_to_date function in this thread to try a number of different format masks to find one that works (if there are a lot of possible masks, you may want to iterate through a collection rather than hard-coding the two masks as I did in that example).

One additional point... An Oracle DATE stores the time to the second which appears to be the precision of the data you're being given. It would, therefore, seem to make more sense to load the data into a DATE column rather than a TIMESTAMP column.

Upvotes: 2

Related Questions