Reputation: 21
I need some guidance on the best use case for snowflake stored procedure compared with user-defined function. Please are there guidelines on scenarios where one is more appropriate than the other? I am trying to make the right selection for my project. Thanks.
Upvotes: 2
Views: 4282
Reputation: 131
The important distinction is where in the execution stack the object lives. The pattern in Snowflake is pretty typical for database systems: you have a client program that issues SQL to Snowflake, where the system creates a plan; that plan is then distributed to the execution engine, which actually does the work.
Visually:
[Client Program] --SQL--> [Planner] --plan--> [Execution Engine]
A user defined function is something that is actually built into the plan and run by the execution engine. Because it's actually part of the plan, Snowflake will do things like guarantee transactional semantics, and distribute & scale out the work across a warehouse.
A stored procedure, on the other hand, really takes the place of a client program -- it just happens to be stored & executed inside of Snowflake. That's great from a management perspective, but Snowflake can't do anything special to scale this out or provide any special guarantees around transactions. It's just a client program.
To give a concrete example of how these differ, say that I want to run a query from within my stored procedure or function. For a procedure, that's no problem: it's very much like any client issuing a query.
On the other hand, maybe I wanted my function to issue a simple query to look up a user's name from their identifier. We can't, but let's say we were able to wrap that query in a function called get_name()
. Now I might issue a query like:
SELECT id, get_name(id)
FROM my_table
But let's think through what would happen when this runs. The query includes my function, so the plan containing this call will get distributed out over my warehouse. But say my_table
had 1M records: that means 1M calls to get_name()
and 1M queries that will get issued to Snowflake.
Ouch. So the upshot is that you can't do this.
Anyway, so a long-winded answer to your question. But it really comes down to what you're trying to accomplish. If you're looking to host a client program, you want a procedure. If you're looking to host logic that you actually want to run in a query, you want a function. Sometimes you want to scale out your logic and run it in a query, but, like the case above, you can't -- then you need to get more clever about organizing your logic.
Upvotes: 1
Reputation: 10039
And a guide for choosing to Create a Stored Procedure or a UDF:
Upvotes: 5