Paul Grubb
Paul Grubb

Reputation: 41

SnowFlake Stored Procedure Multi Threading

Being new to Snowflake I am trying to understand how to write JavaScript based Stored Procedures (SP) to take advantage of multi-thread/parallel processing. My background is SQL Server and writing SP, taking advantage of performance feature such as degrees of parallelism, worker threads, indexing, column store segment elimination.

I am started to get accustomed to setting up the storage and using clustering keys, micro partitioning, and any other performance feature available, but I don't get how Snowflake SPs break down a given SQL statement into parallel streams. I am struggling to find any documentation to explain the internal workings. My concern is producing SPs that serialise everything on one thread and become bottlenecks. I am wondering if I am applying the correct technique/ need a different mindset to developing SPs. I hope I have explained my concern sufficiently. In essence I am building a PoC to migrate an on-premise SQL Server DWH ETL solution to Snowflake/Matillion ELT solution, one aspect being evaluating the compute virtual warehouse size I need.

Upvotes: 4

Views: 2470

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 26068

stateless UDF will run in parallel by default, this what I observed when do large amount of binary data importing via base64 encoding.

stateful UDF's run in parallel on the date as controlled by the PARTITION BY and ORDER BY clauses used on the data. The only trick to remember is to always force initialize your data, as the javascript instance can be used on subsequent PARTITON BY batches, thus don't rely on check for undefined to know if it's the first row.

Upvotes: 1

Related Questions