SQL_Guy
SQL_Guy

Reputation: 353

Challenging recursive T-SQL query

Here is my scenario. Let's say I have two tables "Car" and "CarPart". The car consist of many parts and each part can belong to multiple cars. One complication in my case is that each part gets a new PartID, even if it's the same part name, but it simply belongs to a different car. This is something I have no control over, so just bear with me. Here is the script to set things up.

IF OBJECT_ID('Car') IS NOT NULL DROP TABLE Car
CREATE TABLE Car (
CarID INT,
CarName VARCHAR(16)
)

IF OBJECT_ID('CarPart') IS NOT NULL DROP TABLE CarPart
CREATE TABLE CarPart (
PartID INT,
PartName VARCHAR(16),
CarID INT
)

INSERT INTO Car
VALUES (1, 'Chevy'),
    (2, 'Ford'),
    (3, 'Toyota'),
    (4, 'Honda'),
    (5, 'Nissan'),
    (6, 'Hugo')

INSERT INTO CarPart 
VALUES  (110, 'Engine', 1),
  (120, 'Engine', 2),
  (210, 'Door', 1),
  (220, 'Door', 3),
  (310, 'Seat', 4),
  (320, 'Seat', 5),
  (410, 'Window', 3),
  (510, 'Wheel', 2),
  (420, 'Window', 6)

As you can see, the part "Engine" belongs to both "Chevy" and "Ford" and is listed twice with different IDs. Once again, this is a design limitation I have to live with.

Here is what I need to accomplis: given a car, I need to find all of the parts for this car and all of the other cars that these parts belong to. I have to continue finding parts and cars in a recursive manner until I reach the end of the chain. The logic can be outlined as follows: @StartCar --> Parts of a @StartCar --> Other parts by the same name --> get Id's of those "other" parts --> Get cars which "own" those parts --> start over and repeat until the end of the chain is reached.

To solve my problem, I tried this query:

DECLARE @StartCar VARCHAR(16) = 'Chevy'

;WITH cte (CarName, PartName)
AS
(
SELECT c.CarName,
       cp.PartName
FROM CarPart cp
JOIN Car c ON cp.CarID = c.CarID
WHERE c.CarName = @StartCar
UNION ALL
SELECT c.CarName,
       cp.PartName
FROM CarPart cp
JOIN Car c ON cp.CarID = c.CarID
JOIN cte cte ON cp.PartName = cte.PartName
)
SELECT CarName, PartName
FROM cte

However, it gets into an infinite loop and terminates. I would expect see the output similar to this:

CarName PartName
Chevy Engine
Chevy Door
Ford Engine
Ford Wheel
Toyota Door
Toyota Window
Hugo Window

I appreciante any pointers.

Thank you!

Upvotes: 5

Views: 421

Answers (4)

Steve Kass
Steve Kass

Reputation: 7184

You are basically traversing a graph that's not acyclic, so you must explicitly avoid cycles. One way is to keep track of the path in the graph. Here's code that should work. You could use SQL Server's HIERARCHYID data type to hold the paths, also.

I chose to make the CTE a table of cars instead of a table of parts. Your rule never results in some, but not all, parts for a specific car, so this seemed simpler.

WITH cte(CarID,hier) AS (
  SELECT CarID, CAST('/'+LTRIM(CarID)+'/' AS varchar(max))
  FROM Car
  WHERE CarName = @StartCar

  UNION ALL

  SELECT c2.CarID, hier+LTRIM(c2.CarID)+'/'
  FROM Car AS c
  JOIN cte ON cte.CarID = c.CarID
  JOIN CarPart AS c1 ON c.CarID = c1.CarID
  JOIN CarPart AS c2 ON c2.PartName = c1.PartName
  WHERE hier NOT LIKE '%/'+LTRIM(c2.CarID)+'/%'
)

SELECT
  c.CarName, cp.PartName
FROM Car AS c
JOIN CarPart AS cp ON cp.CarID = c.CarID
JOIN cte on cte.CarID = c.CarID

Upvotes: 2

Sebastian Meine
Sebastian Meine

Reputation: 11813

The reason your cte goes into an endless loop is because you do not define a hierarchy. If you draw you relationship diagram you will see many circles causing any walkthrough to loop forever.

To solve that, the first thing is to create a hierarchy. In my code the first cte car_hierarchy does that by finding all CarID pairs but restricting that the left one must be smaller than the right on. With that you now have a circle-free directed relationship graph. (If you ignore the direction you might still find circles, but that doesn't matter to the algorithm.)

The second step is to find all relatives for a given car. Because the car given might not sit at the end of the hierarchy this is a two step process. First find the left most connected car, than find all connected car stating from there. car_left and car_right in the query below do just that.

The final step is to take the IDs and pull the car and part names back in:

IF OBJECT_ID('dbo.Car') IS NOT NULL DROP TABLE dbo.Car
CREATE TABLE dbo.Car (
CarID INT,
CarName VARCHAR(16)
)

IF OBJECT_ID('dbo.CarPart') IS NOT NULL DROP TABLE dbo.CarPart
CREATE TABLE dbo.CarPart (
PartID INT,
PartName VARCHAR(16),
CarID INT
)

INSERT INTO dbo.Car
VALUES (1, 'Chevy'),
    (2, 'Ford'),
    (3, 'Toyota'),
    (4, 'Honda'),
    (5, 'Nissan'),
    (6, 'Hugo')

INSERT INTO dbo.CarPart 
VALUES  (110, 'Engine', 1),
  (120, 'Engine', 2),
  (210, 'Door', 1),
  (220, 'Door', 3),
  (310, 'Seat', 4),
  (320, 'Seat', 5),
  (410, 'Window', 3),
  (510, 'Wheel', 2),
  (420, 'Window', 6)


DECLARE @StartCarID INT = 1;

WITH 
car_hierachy (CarID1, CarID2) AS (
  SELECT DISTINCT
         cp1.CarID CarID1,
         cp2.CarID CarID2
  FROM dbo.CarPart cp1
  JOIN dbo.CarPart cp2
  ON cp1.PartName = cp2.PartName
  AND cp1.CarID < cp2.CarID
),
car_left(CarID) AS (
  SELECT @StartCarID 
  UNION ALL
  SELECT ch.CarID1
  FROM car_hierachy ch
  JOIN car_left cl
  ON cl.CarID = ch.CarID2
),
car_right(CarID) AS (
  SELECT MIN(CarID) 
  FROM car_left
  UNION ALL
  SELECT ch.CarID2
  FROM car_hierachy ch
  JOIN car_right cr
  ON cr.CarID = ch.CarID1
)
SELECT *
FROM car_right ac
JOIN dbo.Car c
ON ac.CarID = c.CarID
JOIN dbo.CarPart cp
ON c.CarID = cp.CarID
ORDER BY c.CarId, cp.PartId; 

SQLFiddle

This should solve your problem. However I am not sure it will perform well. With a large dataset you might actually be better of using a loop. But with appropriate indexing it might just work. So give it a try.

(I switched the start car from chevy to Toyota to show that it works for ars in the middle of the hierarchy too. If you walk only outwards from Toyota you will miss Ford.)

Upvotes: 2

shibormot
shibormot

Reputation: 1638

SQL Fiddle

Query 1:

declare @t table (
  car_name varchar(100), 
  part_name varchar(100)
  )

declare @car int = 3

insert @t
select c.CarName, p.PartName
from Car c join CarPart p on c.CarID = p.CarID
where c.CarID = @car


while exists(
  select c.CarName, p.PartName
  from Car c join CarPart p on c.CarID = p.CarID
  where c.CarName in (
    select c.CarName
    from Car c join CarPart p on c.CarID = p.CarID
    where p.PartName in (select part_name from @t)
      and c.CarName not in (select car_name from @t)
    )
) 
insert @t
  select c.CarName, p.PartName
  from Car c join CarPart p on c.CarID = p.CarID
  where c.CarName in (
    select c.CarName
    from Car c join CarPart p on c.CarID = p.CarID
    where p.PartName in (select part_name from @t)
      and c.CarName not in (select car_name from @t)
    )

select * from @t

Results:

| CAR_NAME | PART_NAME |
------------------------
|   Toyota |      Door |
|   Toyota |    Window |
|    Chevy |    Engine |
|    Chevy |      Door |
|     Hugo |    Window |
|     Ford |    Engine |
|     Ford |     Wheel |

Upvotes: 2

Andrew Morton
Andrew Morton

Reputation: 25067

It looks like you need a junction table so that you have car->car parts with serial numbers->car part names. The CarPart table should not have the part serial number in it.

Upvotes: 0

Related Questions