Evan Jennings
Evan Jennings

Reputation: 1

Snowflake: In a MERGE or COPY command (from external stage) can I specify that only the newest csv file should be merged/copied?

Right now, our MERGE/COPY commands point to an s3 folder. Anytime there's more than a single csv file in the S3 folder, Snowflake throws a "duplicate rows" error. I manually move s3 files each morning so that there's only ever one file in the s3 folder. How can I tell snowflake to only MERGE/COPY the newest csv file in the folder? (NOTE: date/time is part of our naming convention for these csv files)

Upvotes: 0

Views: 2393

Answers (1)

CodeMonkey
CodeMonkey

Reputation: 477

Assuming you are using Dell Boomi to execute your COPY INTO command, are multiple files coming into your S3 bucket in the same load or are they incrementally loading?

If they are incrementally loading I would set PURGE = TRUE on your COPY INTO statement so that once the file is correctly copied it is deleted from your S3 bucket and when the next file comes in there won't be a conflict copying to your stage table. PURGE = TRUE requires you to make sure permissions are setup correctly to allow Snowflake to delete from your S3.

https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#purging-files-after-loading

You can also query try doing something like the following if you want to try and get really clever:

COPY INTO YourTable
    (
    RAW_FILE_NAME
    , RAW_FILE_ROW_NUMBER
    , ColPK
    , ColVal
    , ColVal2
    )
  FROM (WITH CTE AS (SELECT ROW_NUMBER() OVER(ORDER BY metadata$filename DESC) AS rnk, metadata$filename, metadata$file_row_number, t.$1,t.$2,t.$3
        FROM @YourStage/YourDirectory/
        AS t) SELECT metadata$filename, metadata$file_row_number, $1,$2,$3 FROM CTE WHERE rnk = 1);

Upvotes: 0

Related Questions