user2014025
user2014025

Reputation: 93

sql avoid cartesian product

I'm pretty new to SQL and am struggling with a query (using Access, FWIW). I have Googled and searched StackOverflow, but I haven't seen this exact scenario. (That could also be because I don't know the correct search terms.)

I have two pretty simple tables that contain similar data.

table1: state, lname, fname, network
table2: state, lname, fname, network

What I want is to find each person/state combo that match in the two tables plus the networks from each table that the person is in:

state, lname, fname, t1.network, t2.network.  

The person may be in more than one network in each table. I want to see each network (from both tables) that the person belongs to.

I started by using a JOIN as below:

SELECT t1.state, t1.lname, t1.fname, t1.network, t2.network
FROM t1 INNER JOIN t2 
ON t1.fname=t2.fname AND t1.lname=t2.lname AND t1.state=t2.state
GROUP BY t1.state, t1.lname, t1.fname, t1.network, t2.network

I quickly figured out that I get a Cartesian product. So if "NY, Smith, John" was in two networks in t1 and three networks in t2 I would get something like this:

NY, Smith, John, NetworkA, NetworkB
NY, Smith, John, NetworkA, NetworkA
NY, Smith, John, NetworkB, NetworkA
NY, Smith, John, NetworkB, NetworkB
NY, Smith, John, NetworkA, NetworkC
NY, Smith, John, NetworkB, NetworkC

What I really want to see is just:

NY, Smith, John, NetworkA, NetworkA
NY, Smith, John, NetworkB, NetworkB
NY, Smith, John, NULL, NetworkC

Can anyone give me some advice on how to proceed or point me in the right direction?

Upvotes: 9

Views: 35271

Answers (5)

sgeddes
sgeddes

Reputation: 62831

So it looks like you want all records from each of tables that are identical, and then only those from each that are distinct. That means you need to UNION 3 sets of queries.

Try something like this:

SELECT t1.state, 
   t1.lname, 
   t1.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table1 t1 
   INNER JOIN table2 t2 
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
UNION 
SELECT t1.state, 
   t1.lname, 
   t1.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table1 t1 
   LEFT JOIN table2 t2 
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
WHERE t2.network IS NULL
UNION 
SELECT t2.state, 
   t2.lname, 
   t2.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table2 t2 
   LEFT JOIN table1 t1
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
WHERE t1.network IS NULL

This should give you your desired results.

And here is the SQL Fiddle to confirm.

--EDIT

Not thinking today -- you don't really need that first query. You can remove the WHERE condition from the 2nd query and it works the same way. Tired :-)

Here is the updated query -- both should work just fine though, this is just easier to read:

SELECT t1.state, 
   t1.lname, 
   t1.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table1 t1 
   LEFT JOIN table2 t2 
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
UNION 
SELECT t2.state, 
   t2.lname, 
   t2.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table2 t2 
   LEFT JOIN table1 t1
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
WHERE t1.network IS NULL

And the updated fiddle.

BTW -- these should both work in MSAccess as it supports UNION.

Good luck.

Upvotes: 2

chue x
chue x

Reputation: 18803

One option is to use a full outer join:

SELECT coalesce(t1.state, t2.state), coalesce(t1.lname, t2.lname), coalesce(t1.fname, t2.fname),
  t1.network, t2.network
FROM t1 FULL OUTER JOIN t2 
ON t1.fname=t2.fname AND t1.lname=t2.lname AND t1.state=t2.state
and t1.network = t2.network

Output:

NY  SMITH   JOHN    A   A
NY  SMITH   JOHN    B   B
NY  SMITH   JOHN    NULL    C  

Edit: I didn't see you are using Access - this is standard SQL, but I don't know if it will work there.

Upvotes: 0

Paulb
Paulb

Reputation: 1531

What you want is a UNION.

I'd create a query (in SQL view) to the effect of:

SELECT state, lname, fname, network, network
FROM t1
UNION ALL
SELECT state, lname, fname, network, network
FROM t2

Then I would create other queries that query that one.

I'd try it myself to give you more details, but I don't have MS Access installed.

Upvotes: 0

user359040
user359040

Reputation:

In this case, a UNION should be your best bet:

SELECT * FROM t1
UNION
SELECT * FROM t2

(Note that this query should only work as is if all the columns in both tables are of the same type, in the same order - otherwise it's better to specify each required column, rather than using SELECT *).

Upvotes: 0

Aniket Inge
Aniket Inge

Reputation: 25705

SELECT DISTINCT t1.state, t1.lname, t1.fname, t1.network, t2.network
FROM t1 INNER JOIN t2 
ON t1.fname=t2.fname AND t1.lname=t2.lname AND t1.state=t2.state
GROUP BY t1.state, t1.lname, t1.fname, t1.network, t2.network

and you get what you want.

Almost every join starts off building a cartesian product anyway(see Inside SQL Server: Querying book). When you try to filter out the data, usually, the virtual tables created will return a distinct set when you use a distinct. In reality, there is no stoping the cartesian product.

Upvotes: 0

Related Questions