Randy B.
Randy B.

Reputation: 453

Snowflake Unique column allowing duplicate entries

I have a simple table like this:

    CREATE OR REPLACE TABLE ETL_LOG (
       NAME VARCHAR(1000) NOT NULL,
       SCHEMA_NAME VARCHAR(1000) NOT NULL,
       QUERY_TEXT VARCHAR(50000) NOT NULL,
       STATE VARCHAR(1000) NOT NULL,
       ERROR_CODE VARCHAR(1000) NULL,
       ERROR_MESSAGE VARCHAR(500000) NULL,
       SCHEDULED_TIME TIMESTAMP_LTZ(3) NOT NULL,
       NEXTS_SCHEDULED_TIME TIMESTAMP_LTZ(3) NULL,
       COMPLETED_TIME TIMESTAMP_LTZ(3) NOT NULL,
       RUN_ID VARCHAR(5000) NOT NULL,
       UNIQUE(RUN_ID)

     );

When I insert data, despite the unique run it I get stuff like this. No idea why this might be. I have only displayed the unique value (RUN_ID) and completed time here. What causes this? Whitespaces not present in the actual data.

Screenshot of actual data.

+-------------------------------+-------------------------+
|        COMPLETED_TIME         |         RUN_ID          |
+-------------------------------+-------------------------+
| 2020-04-30 01:05:30.034 -0700 |      1588233900020      |
| 2020-04-30 01:05:30.034 -0700 |       1588233900020     |
| 2020-04-30 01:06:17.659 -0700 |           1588233960000 |
| 2020-04-30 01:06:17.659 -0700 |           1588233960000 |
+-------------------------------+-------------------------+

Upvotes: 0

Views: 1146

Answers (2)

Randy B.
Randy B.

Reputation: 453

Due to the limitations pointed out above I ended up creating this work around.

CREATE OR REPLACE table ETL_LOG_DEDUP as select distinct * from ETL_LOG;

I put this in a task that will simply retrieve the non duplicated rows from the original. Maybe not ideal but it will do what I need it to. I created a third task that just truncates the primary table once a week.

Just for context we have something like ~3k tasks running daily and several other monitoring solutions along the data pipeline. I just wanted a very clear and efficient log due to the limitations of the native logging feature in Snowflake. As it only contains task history for a limited time.

Upvotes: 0

Gokhan Atil
Gokhan Atil

Reputation: 10079

I think the title is misleading. If you are using Snowflake (according to the screenshot and the tag you choose), please note that Snowflake does not enforce primary keys:

Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.

https://docs.snowflake.com/en/sql-reference/constraints-overview.html#supported-constraint-types

Upvotes: 1

Related Questions