Kreol
Kreol

Reputation: 365

Hierarchical SQL select-query

I'm using MS SqlServer 2008. And I have a table 'Users'. This table has the key field ID of bigint. And also a field Parents of varchar which encodes all chain of user's parent IDs. For example:

User table:

ID | Parents
1  | null
2  | ..
3  | ..
4  | 3,2,1

Here user 1 has no parents and user 4 has a chain of parents 3->2->1. I created a function which parses the user's Parents field and returns result table with user IDs of bigint.

Now I need a query which will select and join IDs of some requested users and theirs parents (order of users and theirs parents is not important). I'm not an SQL expert so all I could come up with is the following:

WITH CTE AS(
SELECT
    ID,
    Parents
FROM
[Users]
WHERE
(
     [Users].Name = 'John'
)

UNION ALL

SELECT
    [Users].Id,
    [Users].Parents
FROM [Users], CTE
WHERE
(
    [Users].ID in (SELECT * FROM GetUserParents(CTE.ID, CTE.Parents) )
)) 
SELECT * FROM CTE

And basically it works. But performance of this query is very poor. I believe WHERE .. IN .. expression here is a bottle neck. As I understand - instead of just joining the first subquery of CTE (ID's of found users) with results of GetUserParents (ID's of user parents) it has to enumerate all users in the Users table and check whether the each of them is a part of the function's result (and judging on execution plan - Sql Server does distinct order of the result to improve performance of WHERE .. IN .. statement - which is logical by itself but in general is not required for my goal. But this distinct order takes 70% of execution time of the query). So I wonder how this query could be improved or perhaps somebody could suggest some another approach to solve this problem at all?

Thanks for any help!

Upvotes: 1

Views: 624

Answers (2)

ivan_pozdeev
ivan_pozdeev

Reputation: 35998

The recursive query in the question looks redundant since you already form the list of IDs needed in GetUserParents. Maybe change this into SELECT from Users and GetUserParents() with WHERE/JOIN.

select Users.*
from Users join
     (select ParentId
      from (SELECT * FROM Users where Users.Name='John') as U
           cross apply [GetDocumentParents](U.ID, U.Family, U.Parents))
     as gup
on Users.ID = gup.ParentId

Since GetDocumentParents expects scalars and select... where produces a table, we need to apply the function to each row of the table (even if we "know" there's only one). That's what apply does.

I used indents to emphasize the conceptual parts of the query. (select...) as gup is the entity Users is join'd with; (select...) as U cross apply fn() is the argument to FROM.

The key knowledge to understanding this query is to know how the cross apply works:

  • it's a part of a FROM clause (quite unexpectedly; so the syntax is at FROM (Transact-SQL))
  • it transforms the table expression left of it, and the result becomes the argument for the FROM (i emphasized this with indent)

The transformation is: for each row, it

  • runs a table expression right of it (in this case, a call of a table-valued function), using this row
  • adds to the result set the columns from the row, followed by the columns from the call. (In our case, the table returned from the function has a single column named ParentId)
    • So, if the call returns multiple rows, the added records will be the same row from the table appended with each row from the function.

This is a cross apply so rows will only be added if the function returns anything. If this was the other flavor, outer apply, a single row would be added anyway, followed by a NULL in the function's column if it returned nothing.

Upvotes: 1

ivan_pozdeev
ivan_pozdeev

Reputation: 35998

This "parsing" thing violates even the 1NF. Make Parents field contain only the immediate parent (preferably, a foreign key), then an entire subtree can be retrieved with a recursive query.

Upvotes: 0

Related Questions