Vikram S
Vikram S

Reputation: 3

How long does Snowflake really maintain file load history?

Source #1 of 3 - https://docs.snowflake.com/en/user-guide/data-load-considerations-load.html#load-metadata says -

"Snowflake maintains detailed metadata for each table into which data is loaded ... This load metadata expires after 64 days..." followed by an explanation of the LOAD_UNCERTAIN_FILES copy option. This option tells Snowflake whether or not to load files whose metadata, being over 64 days old, has been purged.

#2 of 3 - https://docs.snowflake.com/en/user-guide/data-load-local-file-system-copy.html#monitoring-files-staged-internally says -

"Snowflake retains historical data for COPY INTO commands executed within the previous 14 days... Use the LOAD_HISTORY Information Schema view to retrieve the history of data loaded into tables using the COPY INTO command"

#3 of 3 - https://docs.snowflake.com/en/sql-reference/account-usage/copy_history.html#copy-history-view says -

"This Account Usage view can be used to query Snowflake data loading history for the last 365 days (1 year). The view displays load activity for both COPY INTO statements and continuous data loading using Snowpipe. The view avoids the 10,000 row limitation of the LOAD_HISTORY View."


Question #1 #3 seems to supersede #2, as the duration is 365 days and it maintains metadata not only for bulk loads, but continuous loading as well. Also, apparently there's a row limit on #2. The view in #3 is available only to the ACCOUNTADMIN role by default. But if Snowflake does have the information for the last 365 days, why force the use of LOAD_UNCERTAIN_FILES after just 64 days?

Question #2 Aren't sources #1 and 2 inconsistent?

Upvotes: 0

Views: 2175

Answers (1)

Greg Pavlik
Greg Pavlik

Reputation: 11046

The most important number for behavior of copying files into tables is 64 days. If you run a COPY INTO command from a stage without limiting the files by a list or pattern, Snowflake will not reload a file it's loaded within the previous 64 days.

You can override that using FORCE = TRUE in the copy options, https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#copy-options-copyoptions. That will make it load the files whether they're marked as loaded or not.

When the following three conditions are met, for the purposes of the COPY INTO command, Snowflake does not know if a file has already been loaded or not:

  • The file’s LAST_MODIFIED date (i.e. date when the file was staged) is older than 64 days.
  • The initial set of data was loaded into the table more than 64 day earlier.
  • If the file was already loaded successfully into the table, this
    event occurred more than 64 days earlier.

Under those conditions, the LOAD_UNCERTAIN_FILES option applies.

The other two times concern reporting, not behavior of the COPY INTO command. The 14 days is for the LIST or LS command information returned. The 365 days is for the data shared back from Snowflake to customers through the "snowflake" database. Data can take between 15 minutes and 3 hours to appear in this database depending on the view in question. It remains for 365 days after that.

Upvotes: 1

Related Questions