Reputation: 164
I'm trying to refactor some SQL code to make it more readable and maintainable; but, I don't want to destroy performance. I'm trying to move some column logic in a select statement into several functions, but am seeing huge decreases in performance. I'm hoping you all can help me understand why; and even better, how to fix it!
After refactoring my code looks roughly like the example below. Before refactoring, rather than function calls, the CASE statements were directly within the SUM functions in the select clause.
FUNCTION funcOne(@colA, @colB, @colC, @valX, @valY)
RETURNS INT AS
BEGIN
RETURN CASE
WHEN @colA = @colB
THEN @valX + @valY
WHEN @colC BETWEEN 1 AND 10
THEN @valX
ELSE 0
END
END
FUNCTION funcTwo(@colA, @colB, @colC, @valX, @valY)
RETURNS INT AS
BEGIN
RETURN CASE
WHEN @colA <> @colB
THEN @valX + @valY
WHEN @colC BETWEEN 1 AND 10
THEN @valY
ELSE 0
END
END
SELECT mt.[Ident]
,SUM(funcOne(mt.[colA], ot.[colB], ot.[colC], mt.[valX], ot.[valY])) AS funcOne
,SUM(funcTwo(mt.[colA], ot.[colB], ot.[colC], mt.[valX], ot.[valY])) AS funcTwo
FROM MyTable AS mt
INNER JOIN SomeOtherTable AS ot
ON mt.[Ident] = ot.[Ident]
WHERE mt.[colA] BETWEEN 1 AND 100
GROUP BY mt.[Ident]
Before refactoring the query takes about 60 seconds to run. After refactoring it takes nearly 7 minutes! The scan and read counts are identical, so it's strange to me that it takes so much longer.
What is SQL doing that makes it so inefficient after the refactor? Is there anyway to resolve this AND maintain my nice readable code?
Thanks for all the "why?" information, @conor-cunningham-msft.
In terms, of resolving the performance issue, I ended up using the suggestion from @Simonare and others.
Here's what my code looks like:
FUNCTION funcOne(@colA, @colB, @colC, @valX, @valY)
RETURNS TABLE AS
RETURN (
SELECT CASE
WHEN @colA = @colB
THEN @valX + @valY
WHEN @colC BETWEEN 1 AND 10
THEN @valX
ELSE 0
END AS [MyValue]
)
FUNCTION funcTwo(@colA, @colB, @colC, @valX, @valY)
RETURNS TABLE AS
RETURN (
SELECT CASE
WHEN @colA <> @colB
THEN @valX + @valY
WHEN @colC BETWEEN 1 AND 10
THEN @valY
ELSE 0
END AS [MyValue]
)
SELECT mt.[Ident]
,SUM(funcOne.[MyValue]) AS funcOneValue
,SUM(funcTwo.[MyValue]) AS funcTwoValue
FROM MyTable AS mt
INNER JOIN SomeOtherTable AS ot
ON mt.[Ident] = ot.[Ident]
CROSS APPLY funcOne(mt.[colA], ot.[colB], ot.[colC], mt.[valX], ot.[valY]) AS funcOne
CROSS APPLY funcTwo(mt.[colA], ot.[colB], ot.[colC], mt.[valX], ot.[valY]) AS funcTwo
WHERE mt.[colA] BETWEEN 1 AND 100
GROUP BY mt.[Ident]
This does perform a little slower than before refactoring; BUT the degradation is minimal and, I think, worth it for the sake of maintainability and readability.
Upvotes: 2
Views: 7753
Reputation: 4491
There are a couple of reasons why scalar functions in T-SQL have historically been detrimental to query performance (although that will hopefully improve soon - I'll explain at the end).
Generally most guidance over the past 10+ years for SQL Server has recommended against using scalar T-SQL functions for the reasons I've explained. Most of the external content you will find will likely align with this notion. Please note that historically SQL Server did inline single statement T-SQL table-valued functions (treating them like a view is treated in SQL), but this is a complete historical artifact and it is obviously inconsistent with the T-SQL scalar function treatment.
The QP team at Microsoft has known about these for some time. However, fixing these required a lot of work to get the system into a form where scalar T-SQL function inlining would generally help all customers and not cause some queries to get slower. Unfortunately, the way that most commercial optimizers works creates a model that estimates runtime over some assumptions of how a compute works. That model will be incomplete (example: we don't cost t-sql scalar functions at all today as I noted). One non-obvious side-effect of having a model is that some queries will be outside the model (meaning the optimizer is guessing or using incomplete data) and get a great plan. Some queries will be outside the model and get a horrible plan. Queries inside the model don't always get a great plan, but they do better on average. Going one step further, if the costs or set of considered alternatives changes from one major release of SQL to the next, then when you upgrade you will potentially start getting a different plan than before. For those "outside the model" cases, the effect is pretty random - you can get a much faster or much slower plan in some cases. As such, it becomes very difficult to change the optimizer's cost model without a set of mechanisms to find an prevent plan regressions - otherwise lots of customers will have some queries that were "tuned" for one set of incomplete assumptions and then get a far worse plan when those change from under them. Net-net: The optimizer team did not go change the cost model to fix this because it would have caused more customer harm, on average, than dealing with the pain until there were enough mechanisms in place to give a good customer experience when upgrading.
Over the past few releases, that's exactly what the SQL team has been doing. First, any changes to the cost model or set of plans considered (called the search space) was changed to be tied to the compatibility_level of the database. This allowed customers to upgrade, keep the old compatibility level, and thus generally not see plan changes on the same hardware. It also allows customers to try changing to the new one and immediately go down if there is a problem with a workload, greatly de-risking upgrades which were previously one-way. You can read more about the upgrade advice here. Second, the SQL team added a "flight data recorder" for plan choices over time called the Query Store. It captures prior plans and performance of those plans. This allows customers to "go back" to the prior plan if it was faster (even if you were impacted by one of these out of model cases). This provides another level of insurance against breaking an application on upgrade.
(Apologies this is long-winded - the context is important).
For SQL Server 2019 + SQL Azure, the QP team has introduced a mechanism that inlines many T-SQL scalar functions. You can read the announcement here. There are still heuristics being tuned on this feature to make sure that there are few/no performance regressions compared to not inlining (meaning the QP will generally figure out when it is better to inline vs. not inline and only inline those cases). When inlined, the optimizer has the ability to reorder joins and consider various plan choice alternatives to get you a faster query plan. So, in the end, this uses normal relational operators inside the query processor and costs them that way.
I hope that gives an explanation as to why things are probably slower for you now and gives you some hope that the SQL team is indeed working to improve this scenario as we speak. Best of luck on tuning your app.
Upvotes: 7
Reputation: 30565
scalar valued functions are generally bad practice in terms of performance. Lets say that you have function
CREATE FUNCTION fn_GetName(@CustomerID int)
RETURNS varchar(100)
AS
RETURN (
DECLARE @myResult VARCHAR(100);
SELECT @myResult = Fullname
FROM Sales s
WHERE s.CustomerID = @CustomerID
RETURN @myResult
)
and lets say that we are calling this function like
select
fn_GetName(id)
from Student;
SQL is interpreting this function row by row which causes performance bottleneck. However Table-valued functions does not do row-by-row operations,
CREATE FUNCTION fn_GetNameTable(@CustomerID int)
RETURNS TABLE
AS
RETURN (
SELECT Fullname
FROM Sales s
WHERE s.CustomerID = @CustomerID
)
Then,
SELECT I.[Customer Name]
,S.CustomerType
FROM Sales s
CROSS APPLY fn_GetNameTable(S.CustomerID) I
is SQL native.
You can read more from This addess
Upvotes: 4
Reputation: 1269853
SQL just doesn't break things down well with functions. There is a lot of overhead in calling a user-defined function. You may see some improvement with an inline function, but I would suggest that you use computed columns:
alter table mytable add funcone as ( your `case` expression here );
This works because everything is from one table.
Upvotes: 0