Reputation: 53
I have found 4 ways to see the load history in Snowflake:
SELECT * from "SNOWFLAKE"."ACCOUNT_USAGE"."COPY_HISTORY"
SELECT * FROM "SNOWFLAKE"."INFORMATION_SCHEMA"."LOAD_HISTORY"
SELECT * from "SNOWFLAKE"."ACCOUNT_USAGE"."LOAD_HISTORY"
SELECT * FROM "DATABASE_NAME"."INFORMATION_SCHEMA"."LOAD_HISTORY"
What are the differences between these and when should each be used (or not used) ?
Upvotes: 5
Views: 4598
Reputation: 1105
Here things get little confusing, bare with me, there are two Load History views, a view that belongs to Information Schema and a view that belongs to Account Usage schema. As for Copy History, there are Copy History table function of Information schema and a Copy History view of Account Usage schema.
Information Schema | Account Usage |
---|---|
Load History View | Load History View |
Copy History Table Function | Copy History View |
Load History of Information Schema example:
use database db_1;
select table_name, last_load_time
from information_schema.load_history
where schema_name = current_schema() and
table_name='TABLE_1';
Load History of Account Usage example:
use database db_1;
select table_name, last_load_time
from snowflake.account_usage.load_history
where schema_name = current_schema() and
table_name='TABLE_1';
Copy History view example:
select table_name, last_load_time
from snowflake.account_usage.copy_history
where schema_name = current_schema() and
table_name='TABLE_1';
;
Pay attention the way you query Copy History view and Load History views is almost identical.
Copy History table function example:
select *
from table(
information_schema.copy_history(
TABLE_NAME=>'TABLE_1',
START_TIME=> DATEADD(hours, -1, CURRENT_TIMESTAMP())
)
)
;
If you haven't already noticed, let me clarify that using both Load History and Copy History views you can query load history of more than one table. But with Copy history table function, you are limited to querying load history of a single table at a time. Table name in Copy History table function is required.
Example:
Below query only returns load history of Table_1
.
select *
from table(
information_schema.copy_history(
TABLE_NAME=>'TABLE_1',
START_TIME=> DATEADD(hours, -1, CURRENT_TIMESTAMP())
)
)
;
Below query returns load history of (up to) 10 tables.
USE DATABASE database_a;
SELECT table_name, last_load_time
FROM information_schema.load_history
ORDER BY last_load_time DESC
LIMIT 10;
Copy History and Load History views of Account Usage have latency between the latest changes and when those changes are reflected in these views. To be more precises, Copy History view of Account Usage has up to 120 minute latency and Load History view of Account Usage has up to 90 minute latency in most of the cases. But latency might be up to two days if both of the following conditions are met:
Fewer than 32 DML statements have been added to the given table since it was last updated in COPY_HISTORY/LOAD_HISTORY.
Fewer than 100 rows have been added to the given table since it was last updated in COPY_HISTORY/LOAD_HISTORY.
As for Load History view of Information Schema and Copy History table function there is no latency.
Source | Latency of Data |
---|---|
Copy History view | usually 120 minutes (up to 2 days) |
Load History view of Account Usage | usually 90 minutes (up to 2 days) |
Load History view of Information Schema | No latency |
Copy History table function | No latency |
If you need latest data with no latency, you had better use either Load History view of Information Schema or Copy History table function.
Load History view of Information Schema and Copy History table function retains historic data for 14 days. Copy History view and Load History view of Account Usage retain historic data for 1 year (365 days).
Source | Data Retention |
---|---|
Copy History view | 365 days |
Load History view of Account Usage | 365 days |
Load History view of Information Schema | 14 days |
Copy History table function | 14 days |
Load History views (of both Information Schema and Account Usage) do NOT return the history of data loaded using Snowpipe, it only returns the history of data lated using COPY INTO command. While both Copy History table function and view return the history of data loaded using Snowpipe and Copy INTO
command.
Snowpipe is used in continuous loading.
Snowpipe enables loading data from files as soon as they’re available in a stage.
Source | Load history type |
---|---|
Copy History view | COPY INTO and Snowpipe |
Load History view of Account Usage | COPY INTO only |
Load History view of Information Schema | COPY INTO only |
Copy History table function | COPY INTO and Snowpipe |
Load History of Information Schema returns an upper limit of 10,000 rows. Copy History table function, Copy History view and Load History of Account Usage do not have this limit.
As you can see all have distinct features. To quickly grasp the differences among them, you can refer to the below table that summaries all the features of the Load History, Copy History views and Copy History table function.
Differences | Load History | Copy History | Load History | Copy History |
---|---|---|---|---|
View or Table function | view | view | view | table function |
Schema they belong to | Account Usage | Account Usage | Information Schema | Information Schema |
Multiple tables can be queried |
Yes | Yes | Yes | No |
Data Retention | 365 days | 365 days | 14 days | 14 days |
Latency | usually 90 minutes (up to 2 days) |
usually 120 minutes (up to 2 days) |
No latency | No latency |
Load history type | COPY INTO only |
COPY INTO and Snowpipe |
COPY INTO only |
COPY INTO and Snowpipe |
Maximum number of rows returned | No limit | No limit | 10,000 | No limit |
Upvotes: 1
Reputation: 6907
> SELECT * from "SNOWFLAKE"."ACCOUNT_USAGE"."COPY_HISTORY"
This is an Account Usage view that is used to query data loading history for the last 365 days, for both batch loading (COPY INTO <table>
) and continuous loading (with Snowpipe).
https://docs.snowflake.com/en/sql-reference/account-usage/copy_history.html#copy-history-view
> SELECT * from "SNOWFLAKE"."ACCOUNT_USAGE"."LOAD_HISTORY"
This is also an Account Usage view, but excludes files loaded with Snowpipe. This view may also be subject to latency of up to 90 mins.
> SELECT * FROM "DATABASE_NAME"."INFORMATION_SCHEMA"."LOAD_HISTORY"
This is an Information Schema view that is used to query the history of data loaded into tables using the COPY INTO <table>
command within the last 14 days. The view displays one row for each file loaded, does not include data loaded with Snowpipe.
https://docs.snowflake.com/en/sql-reference/info-schema/load_history.html#load-history-view
> SELECT * FROM "SNOWFLAKE"."INFORMATION_SCHEMA"."LOAD_HISTORY"
This is just a specific example of the previous one, but since the Snowflake
database is a system defined, shared, read-only database, querying this view probably won't return anything (useful).
Upvotes: 2
Reputation: 7369
The difference between account_usage and information_schema is well-documented here:
The key being the amount of history, the latency, and the access control.
The difference between COPY_HISTORY
and LOAD_HISTORY
is that COPY_HISTORY
include SNOWPIPE ingested data, while the LOAD_HISTORY
only includes data loaded via a COPY INTO
statement.
The last one is the LOAD_HISTORY
of the SNOWFLAKE.INFORMATION_SCHEMA
. This will most likely be empty as it would reflect the load history of data into that database/schema and since that is a shared database to customers from Snowflake, you won't see anything in there.
Upvotes: 3