Reputation: 528
In our Snowflake Data Warehouse instance, when loading data from a stage into a table using a COPY INTO
DDL statement, some records in a timestamp_ntz column display a value of Invalid date
in the Snowflake UI.
These Invalid date
values in the timestamp_ntz columns have the following qualities:
WHERE strange_timestamp_col > current_timestamp()
We would expect an invalid data format to return an error when attempting to execute the COPY INTO
DDL statement; instead these nefarious pseudo-timestamps with strange properties are inserted.
Upvotes: 6
Views: 9132
Reputation: 528
We discovered that some of the unix timestamp values in our staged parquet files were formatted as integers and some as strings!
The solution is to always cast the column to VARCHAR and then to TIMESTAMP_NTZ.
Example using a unix timestamp:
SELECT 1620502461213752::timestamp_ntz;
-> Invalid date
SELECT 1620502461213752::varchar::timestamp_ntz;
-> 2021-05-08 19:34:21.213
SELECT '1620502461213752'::timestamp_ntz;
-> 2021-05-08 19:34:21.213
This seems to be because timestamp_ntz
only accepts integer epoch timestamps in milliseconds (eg. 1620502461213).
However, when the integer is cast to varchar
first, then timestamp_ntz
correctly interprets the epoch timestamp in microseconds (eg. 1620502461213752). This likely applies to timestamps in nanoseconds as well, although I did not confirm that case.
So it seems that Invalid date
is a strange front-end issue for timestamps far in the future, created by misidentifying epoch timestamp units.
Upvotes: 7