Maran
Maran

Reputation: 171

snowflake setting task dependencies

I have a task which needs to be executed after successful completion of different predecessor tasks.

Say for example below three tasks triggers at same time and calls different stored proc.

CREATE TASK myschema.mytask_1
    WAREHOUSE = mywh
    schedule='USING CRON 0 8 * * MON America/Los_Angeles'
    as call myschema.MY_PROC_1();

CREATE TASK myschema.mytask_2
    WAREHOUSE = mywh
    schedule='USING CRON 0 8 * * MON America/Los_Angeles'
    as call myschema.MY_PROC_2();


CREATE TASK myschema.mytask_3
    WAREHOUSE = mywh
    schedule='USING CRON 0 8 * * MON America/Los_Angeles'
    as call myschema.MY_PROC_3();

However I want the below 4th task to be executed after all the above three completes successfully. If any one of them got failed 4th shouldn't trigger. In short 4th task depend on completion of all above three tasks. I have read through some snowflake docs and found only one task can be set as dependencies. For now can think of below by going one after the other. Also I'm not sure how to evaluate the successful completion of prior task to proceed further. Can someone please help me to achieve this is in any better way. Any help on this is much appreciated.

CREATE TASK myschema.mytask_1
    WAREHOUSE = mywh
    schedule='USING CRON 0 8 * * MON America/Los_Angeles'
    as call myschema.MY_PROC_1();

CREATE TASK myschema.mytask_2
    WAREHOUSE = mywh
    AFTER myschema.mytask_1
    as call myschema.MY_PROC_2();


CREATE TASK myschema.mytask_3
    WAREHOUSE = mywh
    AFTER myschema.mytask_2
    as call myschema.MY_PROC_3();
    
CREATE TASK myschema.mytask_4
    WAREHOUSE = mywh
    AFTER myschema.mytask_3
    as call myschema.MY_PROC_4();

Upvotes: 0

Views: 3041

Answers (2)

Francesco Quaratino
Francesco Quaratino

Reputation: 590

Although the solution with Streams suggested by Mike Walton is fascinating, implementing a single stored procedure (possibly within an explicit transaction so that it rolls back when an error occurs) might be a simpler and, therefore, better maintainable solution. Having said that, if performance is key you may want to opt for the Streams-option cause it guarantees the different pieces of code within each stored procedure run concurrently, whereas the Single-SP would run them sequentially.

Upvotes: 1

Mike Walton
Mike Walton

Reputation: 7339

If you run the tasks in succession, then any failure will stop the rest from executing, and when the next scheduled execution comes around, it'll start from the beginning and execute again. Depending on your logic, this may not be the behavior you're looking for.

Regarding the first option, one possible solution here is to leverage streams to initiate the 4th task. It is a little bit unorthodox, but you can make it work. Here are the basic steps to look into trying:

  1. Each of the 3 parallel tasks would need to insert a record into a separate table upon successful completion of the SP, so it'd have to be the last step in the SP.
  2. Each of those 3 tables would need to have a STREAM object created.
  3. You'd schedule the task to run every minute and use a WHEN clause that looked something like the below code.
  4. You would then need to execute some additional tasks after the 4th task that did some DML statement against your streams, so that the stream would get reset.

Step 3 example:

CREATE OR REPLACE TASK mytask_4
  WAREHOUSE = xxxx
  SCHEDULE = '1 MINUTE'
   WHEN SYSTEM$STREAM_HAS_DATA('mytask_1_stream') = True
    AND SYSTEM$STREAM_HAS_DATA('mytask_2_stream') = True
    AND SYSTEM$STREAM_HAS_DATA('mytask_3_stream') = True;

Step 4 exmple:

CREATE OR REPLACE TASK mytask_5
  WAREHOUSE = xxxx
  AFTER myschema.mytask_4
INSERT * INTO log_table FROM mytask_1_stream;

CREATE OR REPLACE TASK mytask_6
  WAREHOUSE = xxxx
  AFTER myschema.mytask_4
INSERT * INTO log_table FROM mytask_2_stream;

CREATE OR REPLACE TASK mytask_7
  WAREHOUSE = xxxx
  AFTER myschema.mytask_4
INSERT * INTO log_table FROM mytask_3_stream;

Like I said, it's a bit of a workaround, but it should work nicely in most cases. And one additional point, mytask_4 in this case will never use any compute unless all 3 of the streams contain data, which means all 3 of your previous tasks have executed successfully. Otherwise, the task will be skipped and wait for the next minute to "check again". If you are running the first 3 tasks less often, you can schedule mytask_4 to run less often, as well, if you wish.

Upvotes: 0

Related Questions