Reputation: 1973
I want to understand the efficiency of a SQL query while calculating percentages of categories.
Let's say I have a table like this -
with temp as
(
select "a" as product, 100 as sales union all
select "b" as product, 200 as sales union all
select "c" as product, 300 as sales
)
Every product appears once and its sales besides it. Now, if i wish to calculate the percentage of sales for every product, which query is more efficient in SQL?
-- Query 1
with temp as
(
select "a" as product, 100 as sales union all
select "b" as product, 200 as sales union all
select "c" as product, 300 as sales
)
select *,
sales * 100 / sum(sales) over () as percentage_sales
from temp t
OR
-- Query 2
with temp as
(
select "a" as product, 100 as sales union all
select "b" as product, 200 as sales union all
select "c" as product, 300 as sales
)
, total as
(
select sum(sales) grand_total from temp
)
select product, sales,
sales * 100 / t.grand_total as percentage_sales
from temp, total t
Which of the queries is more efficient and optimized ? Is it based on SQL engine based on what we use? Or is there an overall rule of thumb here?
I am trying to understand it from a generic viewpoint and not pertaining to any DBMS vendor.
Upvotes: 0
Views: 123
Reputation: 1269953
In almost any database that supports window functions, the window functions are going to perform better than the subquery. Window functions were designed for this purpose. And if the functionality is going to be added into the database, it might as well perform at least as well as the alternatives. (That said, the same logic doesn't seem to apply to pivot
, but that is not standard functionality that would ever be compared across databases.)
Your example is particularly simple, so the two methods are likely to be very close. If you had a partition by
versus join
, then it would probably be hands down on anything but the smallest datasets.
I do agree with TheImpaler that for such questions you should test on your database and your data. But I would expect the window function to generally win outright, possibly being a tie because your example is a very simple use-case.
Upvotes: 1
Reputation: 48810
There's no generic answer. The performance greatly depends on the specific database engine capabilities.
The capabilities relate to the ability of rephrasing a query (rewriting equivalent ones), add optimizations, produce multiple query plans, and correctly assessing the cost of each one.
Moreover, the above is also influenced by the repertoire of database operations available in the engine. Simpler engines (e.g. MySQL) have far fewer operations where to choose from compared to strong ones (Oracle, DB2).
For some engines those two queries can actually be equivalent. The rewriting phase (ritgh after the parsing) may decide to rewrite it the other way, since that would produce the same result. Remember that's perfectly valid since SQL is a declarative language, not an imperative one.
So, bottom line: it greatly depends on the database. Why don't you retrieve the execution plan of each one in the database of your choosing? That will tell you something about it. Plus you'll learn a lot in the process.
Upvotes: 1