Woody1193
Woody1193

Reputation: 7960

Snowflake CSV file format does not handle timestamps as expected

I have a table in Snowflake with a timestamp column, defined like so:

last_updated_utc TIMESTAMP_NTZ NOT NULL

and I have a CSV file format defined like this:

CREATE FILE FORMAT IF NOT EXISTS MY_DATABASE.MY_SCHEMA.assets_format
    TYPE = CSV
    COMPRESSION = AUTO
    FIELD_DELIMITER = '|'
    SKIP_HEADER = 1
    SKIP_BLANK_LINES = TRUE
    DATE_FORMAT = 'YYYY-MM-DD'
    TIME_FORMAT = 'HH:MI:SS.FFTZH'
    TIMESTAMP_FORMAT = 'YYYY-MM-DDTHH:MI:SS.FFTZH'
    TRIM_SPACE = TRUE

The problem I'm having is that, if I attempt to copy staged data using this copy statement:

COPY INTO MY_DATABASE.MY_SCHEMA.my_table 
    FROM (SELECT s.$1, s.$2, s.$3, s.$4, s.$5, s.$6, s.$7, s.$8, 
            s.$9 IF s.$9 IS NOT NULL ELSE '', s.$10, s.$11, s.$12, 
            s.$13, s.$14, s.$15, s.$16
        FROM @MY_DATABASE.MY_SCHEMA.%my_table/Pdata s)
    FILE_FORMAT = ( FORMAT_NAME = 'MY_DATABASE.MY_SCHEMA.assets_format' )
    PATTERN = '.*[.]csv[.]gz' 
    PURGE = TRUE 
    ON_ERROR = 'SKIP_FILE'

I get the following error, very deep in the file:

Timestamp '2022-08-11T00:00:00Z' is not recognized.

I figured this was an issue with how I had defined my timestamp format, so I tried the following:

USE DATABASE MY_DATABASE;
ALTER SESSION SET TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DDTHH:MI:SS.FFTZH';
SELECT TO_TIMESTAMP_NTZ('2022-08-11T00:00:00Z')

but this worked. So, I'm not sure what I'm actually doing wrong here.

Upvotes: 0

Views: 1330

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

Interestingly with the default TIMESTAMP_INPUT_FORMAT:

ALTER SESSION UNSET TIMESTAMP_INPUT_FORMAT;
select column1
    ,try_to_timestamp(column1) as t1
    ,try_to_timestamp_ntz(column1) as t2
    ,try_to_timestamp_ntz(column1, 'YYYY-MM-DDTHH:MI:SS.FFTZH') as t3
    ,try_to_timestamp_tz(column1, 'YYYY-MM-DDTHH:MI:SS.FFTZH') as t4
    ,try_to_timestamp_ntz(column1, 'YYYY-MM-DDTHH:MI:SS.FF') as t5
    ,try_to_timestamp_ntz(column1, 'YYYY-MM-DDTHH:MI:SS.FFZ') as t6
from values
('2022-08-11T00:00:00Z'),
('2022-08-11T00:00:00');

gives:

COLUMN1 T1 T2 T3 T4 T5 T6
2022-08-11T00:00:00Z 2022-08-11 00:00:00.000 2022-08-11 00:00:00.000 null null null null
2022-08-11T00:00:00 2022-08-11 00:00:00.000 2022-08-11 00:00:00.000 null null 2022-08-11 00:00:00.000 null

and

SELECT TO_TIMESTAMP_NTZ('2022-08-11T00:00:00Z')

gives:

TO_TIMESTAMP_NTZ('2022-08-11T00:00:00Z')
2022-08-11 00:00:00.000

Thus the format is not actually correct for the timestamps (as per the first results) and the inbuilt parsers handle your string by default. If I change you use your format, all timestamps fail to parse.

So given you have no Millisecond in you input string I would drop the FF clause, and given you do not have Timezone Hours I would drop TZH

but for me on the default timestamp formats, all three work:

select column1
    ,try_to_timestamp_ntz(column1) as t1
    ,try_to_timestamp_ntz(column1, 'YYYY-MM-DDTHH:MI:SS.FFTZH') as t2
    ,try_to_timestamp_ntz(column1, 'YYYY-MM-DDTHH:MI:SSZ') as t3
from values
('2022-08-11T00:00:00Z');
COLUMN1 T1 T2 T3
2022-08-11T00:00:00Z 2022-08-11 00:00:00.000 null 2022-08-11 00:00:00.000

Upvotes: 1

Related Questions