Reputation: 541
I wanted to have a view of a table at the end of each day at midnight in the PST timezone. These tables are very small, only 300 entries per day on average.
I want to track the change in the rows based on the ids of the table and take a snap shot of their state each day, where each new table view would have a 'date' status.
The challenge is that the original table is growing so each new 'snapshot' will be a different size.
Here is an example of my data:
Day 1
Id Article Title Genre views date
1, "I Know Why the Caged Bird Sings", 10, 01-26-2019
2, "In the Cold", "Non-Fiction", 20, 01-26-2019
Day 2
Id Article Title Genre views date
1, "I Know Why the Caged Bird Sings", "Non-Fiction", 20, 02-27-2019
2, "In the Cold", "Non-Fiction", 40, 02-27-2019
3, "Bury My Heart At Wounded Knee", "Non-Fiction", 100, 02-27-2019
I have a stored procedure that I would like to create to copy the state of the current table. However it is not recommended to create a table in a stored procedure, so I am trying to create a task that manages the table creation and stored procedure call:
USE WAREHOUSE "TEST";
CREATE DATABASE "Snapshots";
USE DATABASE "Snapshots";
Create or Replace Table ArticleLibrary (id int, Title string, Genre string, Viewed number, date_captured timestamp );
INSERT INTO ArticleLibrary Values
(1, 'The man who walked the moon', 'Non-Fiction', 10, CURRENT_DATE() ),
(2, 'The girl who went to Vegas', 'Fiction', 20 , CURRENT_DATE())
;
SELECT * FROM ArticleLibrary;
//CREATE Stored Procedure
create procedure Capture_daily()
Returns string
LANGUAGE JAVASCRIPT
AS
$$
var rs = snowflake.execute({sqlText: "})
var rs = snowflake.execute( {sqlText: "COPY INTO "ARTICLELIBRARY"+CURRENT_DATE() * FROM ArticleLibrary; "} ); );
return 'success';
$$
;
CREATE OR REPLACE TASK IF NOT EXISTS Snapshot_ArticleLibrary_Task
WAREHOUSE = 'TEST'
SCHEDULE = '{ 1440 MINUTE | USING CRON 0 America/Los_Angeles }'
AS
CREATE TABLE "ARTICLELIBRARY"+"CURRENT_DATE()";
CALL Capture_daily();
//Run tomorrow
INSERT INTO ArticleLibrary Values
(3, 'The Best Burger in Town', 'News', 100, CURRENT_DATE());
I need some help improving the Stored Procedure and task I set up, I am not sure how to call more than one sql statement at the end of the task.
I am open to advice on how to better achieve this, considering this is a small amount of data and just an experiment to demonstrate compute cost on a small scale. I also am considering using a window function with a window frame in one large table that inserts data from each new day, where the date is the status, the ids would then not be unique.
Upvotes: 2
Views: 4044
Reputation: 1
If you're only looking for 30 days of snapshots (or anything up to 90 days), and you're not on Standard edition, you can definitely use Time Travel for this. A query such as the following will give you the data as of a certain time, and I believe that you could join to a mini-time dimension and use the dates from that to query for multiple dates at once.
SELECT * FROM table AT(TIMESTAMP=>'2024-03-18'::timestamp_tz)
Upvotes: 0
Reputation: 196
For the question: I am not sure how to call more than one sql statement at the end of the task.
One of the approaches would be to embed the multiple sql commands in their desired order in a stored procedure and call this stored procedure through the Task.
create or replace procedure capture_daily()
returns string
language javascript
as
$$
var sql_command1 = snowflake.createStatement({ sqlText:`Create or Replace Table
"ARTICLELIBRARY".....`});
var sql_command2 = snowflake.createStatement({ sqlText:'COPY INTO
"ARTICLELIBRARY" ...`});
var sql_command3 = snowflake.createStatement({ sqlText:"Any Other DML
Command OR CALL sp_name"});
try
{
sql_command1.execute();
sql_command2.execute();
sql_command3.execute();
return "Succeeded.";
}
catch (err)
{
return "Failed: " + err;
}
$$
;
CREATE OR REPLACE TASK IF NOT EXISTS Snapshot_ArticleLibrary_Task
WAREHOUSE = 'TEST'
SCHEDULE = '{ 1440 MINUTE | USING CRON 0 America/Los_Angeles }'
AS
CALL Capture_daily();
Upvotes: 0
Reputation: 6279
Since you're talking about daily snapshots and such a small amount of data I would insert each days snapshot into a single table with the current_date()
as a new column called "snapshot_id" for example.
You can have a view on top of this table that shows the latest day or even a UDF that can take the day as a parameter and return the results for any day. This table will be extremely quick since it'll be naturally clustered by the "snapshot_id" column and you will have all of your history in one spot which is nice and clean.
I've done this in the past where our source tables had millions of records and you can get quite far with this approach.
Upvotes: 3
Reputation: 7369
I would recommend leveraging the Zero-Copy Cloning functionality of Snowflake for this. You can create a clone every day with a simple command, it will take no time, and if the underlying data isn't completely changing every day, then you're not going to use any additional storage, either.
https://docs.snowflake.net/manuals/sql-reference/sql/create-clone.html
You would still need an SP to dynamically create the table name based on the date, and you can execute that SP from a TASK.
Upvotes: 2