Reputation: 365
I'm using MS SqlServer 2008. And I have a table 'Users'. This table has the key field ID of bigint. And also a field Parents of varchar which encodes all chain of user's parent IDs. For example:
User table:
ID | Parents
1 | null
2 | ..
3 | ..
4 | 3,2,1
Here user 1 has no parents and user 4 has a chain of parents 3->2->1. I created a function which parses the user's Parents field and returns result table with user IDs of bigint.
Now I need a query which will select and join IDs of some requested users and theirs parents (order of users and theirs parents is not important). I'm not an SQL expert so all I could come up with is the following:
WITH CTE AS(
SELECT
ID,
Parents
FROM
[Users]
WHERE
(
[Users].Name = 'John'
)
UNION ALL
SELECT
[Users].Id,
[Users].Parents
FROM [Users], CTE
WHERE
(
[Users].ID in (SELECT * FROM GetUserParents(CTE.ID, CTE.Parents) )
))
SELECT * FROM CTE
And basically it works. But performance of this query is very poor. I believe WHERE .. IN .. expression here is a bottle neck. As I understand - instead of just joining the first subquery of CTE (ID's of found users) with results of GetUserParents (ID's of user parents) it has to enumerate all users in the Users table and check whether the each of them is a part of the function's result (and judging on execution plan - Sql Server does distinct order of the result to improve performance of WHERE .. IN .. statement - which is logical by itself but in general is not required for my goal. But this distinct order takes 70% of execution time of the query). So I wonder how this query could be improved or perhaps somebody could suggest some another approach to solve this problem at all?
Thanks for any help!
Upvotes: 1
Views: 624
Reputation: 35998
The recursive query in the question looks redundant since you already form the list of IDs needed in GetUserParents
. Maybe change this into SELECT
from Users
and GetUserParents()
with WHERE
/JOIN
.
select Users.*
from Users join
(select ParentId
from (SELECT * FROM Users where Users.Name='John') as U
cross apply [GetDocumentParents](U.ID, U.Family, U.Parents))
as gup
on Users.ID = gup.ParentId
Since GetDocumentParents
expects scalars and select... where
produces a table, we need to apply the function to each row of the table (even if we "know" there's only one). That's what apply
does.
I used indents to emphasize the conceptual parts of the query. (select...) as gup
is the entity Users
is join
'd with; (select...) as U cross apply fn()
is the argument to FROM
.
The key knowledge to understanding this query is to know how the cross apply
works:
FROM
clause (quite unexpectedly; so the syntax is at FROM (Transact-SQL))FROM
(i emphasized this with indent)The transformation is: for each row, it
ParentId
)
This is a cross apply
so rows will only be added if the function returns anything. If this was the other flavor, outer apply
, a single row would be added anyway, followed by a NULL
in the function's column if it returned nothing.
Upvotes: 1
Reputation: 35998
This "parsing" thing violates even the 1NF. Make Parents
field contain only the immediate parent (preferably, a foreign key), then an entire subtree can be retrieved with a recursive query.
Upvotes: 0