Reputation: 25834
I kind of have a feel for why the view is slower: The where clause is probably not applied at the same time. The results do seem to be the same, though. I am not sure what I can do about this, short of not using a view...which is not ideal, as I added the view to avoid code repetition, and I don't want to remove it if it isn't necessary.
Any suggestions for to change the way I am doing this so that I can use a view like in Command 1 but still have my query be executed as quickly as it is executed in command 2?
declare @foo varchar(50)
set @foo = 'be%'
ALTER VIEW [dbo].[wpvw_v]
AS
select distinct [name]
from kvgs kvg left join cdes cde
on kvg.kvgi = cde.kgi
group by [name], cde.kgi, kvg.mU
having count(cde.kgi) >= 2 or kvg.mU = 1 or
exists (select [name] from FP x where x.name = kvg.name)
--Command 1: Takes 7 seconds
select [name] from wpvw_v where name like @foo
--Command 2: Takes 1 second
SELECT DISTINCT kvg.name
FROM dbo.kvgs AS kvg LEFT JOIN
dbo.cdes AS cde ON kvg.kvgi = cde.kgi
where name like @foo
GROUP BY kvg.name, cde.kgi, kvg.mU
HAVING (COUNT(cde.kgi) >= 2) OR
(kvg.mU = 1) OR
EXISTS
(SELECT Name
FROM dbo.FP AS x
WHERE (Name = kvg.name))
Upvotes: 0
Views: 205
Reputation: 8734
Table-valued functions tend to be quicker than views, assuming your WHERE conditions are known and can be supplied in parameters.
One advantage of table-valued functions is that you can have multiple statements, so you can convert OUTER JOINs to quicker INNER JOINS in subsequent statements. So instead of this:
INSERT INTO @resultTable
table1_id,
table1_column,
table2_column,
table3_column
SELECT
table1.id,
table1.column,
table2.column,
table3.column
FROM
table1
INNER JOIN table2 ON table2.table1_id = table1.id
LEFT OUTER JOIN table3 ON table3.table1_id = table1.id
return @resultTable
... you can do this, which I find is always faster:
INSERT INTO @resultTable
table1_id,
table1_column,
table2_column,
SELECT
table1.id,
table1.column,
table2.column,
FROM
table1
INNER JOIN table2 ON table2.table1_id = table1.id
UPDATE @resultTable SET
table3_column = table3.column
FROM @resultTable AS result
INNER JOIN table3 ON table3.table1_id = result.table1_id
return @resultTable
Upvotes: 0
Reputation: 2702
I believe the following reproduces your problem:
create table tbl (idx int identity(1,1), name varchar(50), val float)
declare @cnt int
set @cnt=0
while @cnt < 10000
begin
insert tbl select char(CAST(rand()*256 AS INT)), rand()
set @cnt = @cnt + 1
end
go
create view tbl_view as select distinct name from tbl group by name having sum(val) > 1
Then if you run the following query:
SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select * from tbl_view where name like @n
SET STATISTICS IO OFF
GO
SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select distinct name from tbl where name like @n group by name having sum(val) > 1
SET STATISTICS IO OFF
You get the following:
(1 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tbl'. Scan count 1, logical reads 338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
Table 'tbl'. Scan count 1, logical reads 338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The view forces it to work off a sub-table first and only then to apply the filter. Now, if you modify the view and remove the DISTINCT, this does not change. But if you modify the view to remove the group by:
create view tbl_view as select name from tbl where val > 0.8 group by name
go
SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select * from tbl_view where name like @n
SET STATISTICS IO OFF
GO
SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select name from tbl where val > 0.8 and name like @n group by name
SET STATISTICS IO OFF
Then you get the same results for both queries:
(1 row(s) affected)
Table 'tbl'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
Table 'tbl'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So it does seem like the HAVING is the barrier.
Upvotes: 0
Reputation: 332731
I didn't think the HAVING clause could accommodate what you'd posted, but I believe your view should be written to use UNIONs instead. Here's my take on it:
ALTER VIEW [dbo].[wpvw_v] AS
WITH names AS(
SELECT k.name
FROM KVGS k
WHERE EXISTS(SELECT NULL
FROM CDES c
WHERE c.kgi = k.kvgi
GROUP BY c.kgi
HAVING COUNT(c.kgi) > 1)
UNION ALL
SELECT k.name
FROM KVGS k
WHERE k.mu = 1
GROUP BY k.name
UNION ALL
SELECT k.name
FROM KVGS k
JOIN FP x ON x.name = k.name
GROUP BY k.name)
SELECT n.name
FROM names n
If you want to filter out duplicates between the 3 SQL statements, change UNION ALL
to UNION
. Then you can use:
SELECT n.name
FROM wpvw_v n
WHERE CHARINDEX(@name, n.name) > 0
Upvotes: 1
Reputation: 294437
Your query from view is like this:
SELECT name FROM (SELECT DISTINCT name FROM ...) WHERE name = @name;
while the second one is:
SELECT DISTINCT name FROM ... WHERE name = @name;
The two queries are very different and even though they produce the same result, the fiurst one can be answered only if the entire table is scanned to produce the distinct names, while the second one can scan only the names you're interested in.
The gist of the problem is that the presence of DISTINCT
places a barrier that does not allow for the filtering predicate to move down the query tree to a place where is effective.
Update
Even if DISTINCT is not a barrier, on second look the second look there is a even more powerful barrier there: the GROUP BY/HAVING clause. One query filters after the GROUP and HAVING condition was applied, the other one before. And the HAVING condition has subqueries that reference name
again. I doubt the QO can proove the equivalence of the filtering before the aggregate and filtering after the aggregate.
Upvotes: 1
Reputation: 7184
Without seeing more (the CREATE TABLE, INDEX, and CONSTRAINT statements for each table, for example) and preferably seeing the query plans as some sample data representative of the cardinality of the join as well, it's hard to say.
Possibly, there is a semantic difference between the queries that has to do with the collation under which the LIKE expression is evaluated, and it might be impossible to coax the same plan.
However, there is probably plenty of room for query tuning here. It seems unlikely you need to fully aggregate all the COUNT()s. You have three rather distinct conditions under which you want to see a "name" in your result. With UNION you might be able to make one or more of them simpler to calculate, and if concurrency isn't an issue, you might even write this as a multi-step user-defined table-valued function that accumulates the names in separate steps.
Upvotes: 0
Reputation: 86789
You could try an inline tabled function (http://www.sqlhacks.com/index.php/Retrieve/Parameterized-View) but tbh I see that as a bit of a hack.
Honestly I'd probably go for the code repetition. I don't really see SQL in the same way that I see other code - I keep on seeing vast differences in performance between otherwise logically equivalent statements.
Upvotes: 0
Reputation: 31811
As far as I know, the full result set of the view is collected and then further widdled down by the SELECT statement that uses it. This is very different from your second SELECT statement, which doesn't collect any more than it needs.
Upvotes: 0