Adam
Adam

Reputation: 29079

Did two LEFT JOIN used to produce a cartesian product?

I am reading the book SQL Antipattern from Bill Karwin. In Chapter 18 he discusses a bad complex query on this example:

SELECT p.product_id, 
COUNT(f.bug_id) AS count_fixed, 
COUNT(o.bug_id) as count_open 
FROM BugsProducts p
LEFT Outer JOIN (BugsProducts bpf JOIN Bugs f Using (bug_id))
   ON (p.bug_id = f.bug_id AND f.status = 'FIXED')
LEFT OUTER JOIN (BugsProducts bpo JOIN Bugs o Using (bug_id)) 
   ON (p.bug_id = o.bug_id AND o.status = 'OPEN')
WHERE p.product_id = 1
GROUP BY p.product_id

He claims the following:

You happen to know that in reality there are eleven fixed bugs and seven open bugs for the given product. So the Result of the query is puzzeling:

product_id | count_fixed | count_open 
    1      |    77       |     77

Its then explained that this happens because the result is the cartesian product between fixed and open bugs.

I did not understand why this should happen and I rebuild-ed this query with MySQL 5.7.25.

The result was surprisingly

product_id | count_fixed | count_open 
    1      |    11       |     7

Also one can simply replace (BugsProducts bpf JOIN Bugs f Using (bug_id)) with Bugs f and (BugsProducts bpf JOIN Bugs o Using (bug_id)) with Bugs o.

Why is it claimed that the query should do a cartesian product? Is the query only returning the 11/7 result because of some specialty in MySQL that would not work in other DB's?

Upvotes: 1

Views: 3172

Answers (1)

Jannes Botis
Jannes Botis

Reputation: 11242

Two left join can produce a Cartesian product occasionally.

In this case, the query does not make sense at all. Most probable is that it is a mistake.

Try to remove GROUP BY p.product_id and change the select clause as:

SELECT p.product_id, 
f.bug_id AS bug1Id, 
o.bug_id as bug2Id 

This way, it is more obvious what is the result set.

I assume we have the following tables:

  • Products(product_id)
  • Bugs(bug_id, status)
  • BugsProducts(bug_id, product_id)

where BugsProducts is the join table between Products and Bugs

With the query, it is trying to

  • get all BugsProducts (p) rows for product_id = 1
  • joined 1st with (BugsProducts - Bugs inner join) with same bug_id as (p), so joining each row of BugsProducts to itself and also Bugs table if status = 'FIXED', so to summarize if the bug row has status = 'FIXED', it is joined to p(which is to itself), otherwise no row is joined (you see NULL to bug1Id)
  • same with the 2nd join but with condition status = 'OPEN'

Anyway, I believe what the author wanted to demonstrate is something like:

SELECT p.product_id, 
COUNT(f.bug_id) AS count_fixed, 
COUNT(o.bug_id) as count_open 
FROM Products p
LEFT Outer JOIN (BugsProducts bpf JOIN Bugs f Using (bug_id))
   ON (bpf.product_id = p.product_id AND f.status = 'FIXED')
LEFT OUTER JOIN (BugsProducts bpo JOIN Bugs o Using (bug_id)) 
   ON (bpo.product_id = p.product_id AND o.status = 'OPEN')
WHERE p.product_id = 1
Group by p.product_id

which results in a Cartesian product.

Upvotes: 1

Related Questions