Nick
Nick

Reputation: 4766

Is there a way to get data from 2 tables without creating a Cartesian product?

In our database a customer can have any number of drivers, any number of vehicles, any number of storage locations, any number of buildings at those locations, any number of comments, and so on. I need a query that returns all of the customer's information and right now the query is something like:

SELECT *
FROM Customer c
INNER JOIN Driver d ON c.ID = d.CustomerID
INNER JOIN Vehicle v ON c.ID = v.CustomerID

The more that a customer has the bigger the result gets, and it grows exponentially because a cartesian product is being created here. 3 drivers, 3 vechiles creates 9 rows, and this is a very small example compared to what our real data is like. We actually have 10 different tables that can hold as many rows per customer as they want. The norm is 2-7 rows at least per table per customer. we have had as many as 60,000,000+ rows returned (6 items each in 10 different tables, 6^10 = 60,466,176) and for our purposes 6 rows total would have given us all the data we needed if we could just stick the 6 rows in each table together.

so in the smaller example, if 1 customer had 2 vehicles and 3 drivers and another customer had 2 vehicles and 1 drivers i would want a result set that looked like:

CustomerID  | DriverID | VehicleID
1           | 1        | 1
1 (or NULL) | 2        | 2
1 (or NULL) | NULL     | 3
2           | 3        | 4
2 (or NULL) | 4        | NULL

Instead our query that joins every table together on CustomerID looks like this:

CustomerID | DriverID | VehicleID
1          | 1        | 1
1          | 1        | 2
1          | 1        | 3
1          | 2        | 1
1          | 2        | 2
1          | 2        | 3
2          | 3        | 4
2          | 4        | 4

Really, what I want to do is just:

SELECT * FROM Driver
SELECT * FROM Vehicle

Because all we are doing with the data is looping through the rows and formatting the information in a document. All drivers are listed, then all vehicles are listed. It makes no sense to do this crazy huge join when we don't have to, but it's just an arbitrary requirement that it must return all the data in 1 result set from a stubborn superior who refuses to listen to reason. Since the columns are different a UNION isn't possible. i'm just hoping there's a way to stick them together horizontally instead of vertically.

Also, I'm using Microsoft SQL Server.

Upvotes: 0

Views: 2434

Answers (2)

Nick
Nick

Reputation: 4766

Here's how I'm doing it. Instead of:

SELECT *
FROM Customer c
INNER JOIN Driver d ON c.ID = d.CustomerID
INNER JOIN Vehicle v ON c.ID = v.CustomerID

I'm doing:

WITH CustomerCTE AS
(
  SELECT 1 ROW_NUM, ID
  FROM Customer
),
DriverCTE AS
(
  SELECT ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ID) ROW_NUM, *
  FROM Driver
),
VehicleCTE AS
(
  SELECT ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ID) ROW_NUM, *
  FROM Vehicle
)
SELECT *
FROM CustomerCTE c
FULL OUTER JOIN DriverCTE d ON c.ID = d.CustomerID AND c.ROW_NUM = d.ROW_NUM
FULL OUTER JOIN VehicleCTE v ON d.CustomerID = v.CustomerID AND d.ROW_NUM = v.ROW_NUM
ORDER BY
CASE WHEN c.ID IS NOT NULL THEN c.ID ELSE
  CASE WHEN d.CustomerID IS NOT NULL THEN d.CustomerID ELSE
    v.CustomerID
  END
END,
CASE WHEN c.ROW_NUM IS NOT NULL THEN c.ROW_NUM ELSE
  CASE WHEN d.ROW_NUM IS NOT NULL THEN d.ROW_NUM ELSE
    v.ROW_NUM
  END
END

Now if a customer has 3 drivers and 3 vehicles i get 3 rows instead of 9 rows. It makes it look like each driver is associated to 1 of the 3 vehicles, but it's actually not. Again, this is bad design, but it is necessary to cut down on the number of rows returned with the unreasonable restrictions I was given.

It looks like more work than webturner's answer, but in my real case where I have to join 10 different tables with over 500 columns its a lot less work to do it this way than to explicitly name all 500 columns and fill in all of the remaining columns from each table with NULL.

Though, this may not be of much use to most people. In most cases if you're doing something like this you probably need to rethink your design, but there may be some cases where you have no choice.

Upvotes: 1

Stephen Turner
Stephen Turner

Reputation: 7314

It's an ugly hack, but you know your proper solution is just as you state:

SELECT * FROM Driver
SELECT * FROM Vehicle

Instead you could use a union query and blank out the columns from the other tables, just start it with a query that sets the type and names of the columns, with a false coldition so it doesn't return a row:

SELECT 1 AS DriverID, "" AS DriverName, 1 AS VehicleID, "" AS VehicleName WHERE 1=0 
UNION SELECT DriverID, DriverName, NULL, NULL FROM Driver
UNION SELECT NULL, NULL, VehicleID, VehicleName FROM Driver

Really, really bad code! Keep working on your superior to allow a better solution.

Upvotes: 3

Related Questions