DBE7
DBE7

Reputation: 816

Joining on multiple tables causing incorrect results

I am trying to extract some data grouped by the markets we operate in. The table structure looks like this:

bks:
opportunity_id 

bks_opps:
opportunity_id | trip_start | state

bts:
boat_id | package_id

pckgs:
package_id | boat_id

addresses:
addressable_id | district_id

districts:
district_id

What I wanted to do is to count the number of won, lost and total and percentage won for each district.

SELECT          d.name AS "District",
                SUM(CASE WHEN bo.state IN ('won') THEN 1 ELSE 0 END) AS "Won",
                SUM(CASE WHEN bo.state IN ('lost') THEN 1 ELSE 0 END) AS "Lost",
                Count(bo.state) AS "Total",
                Round(100 * SUM(CASE WHEN bo.state IN ('won') THEN 1 ELSE 0 END) / Count(bo.state)) AS "% Won"              
FROM bks b
INNER JOIN bks_opps bo ON bo.id = b.opportunity_id
INNER JOIN pckgs p ON p.id = b.package_id
INNER JOIN bts bt ON bt.id = p.boat_id
INNER JOIN addresses a ON a.addressable_type = 'Boat' AND a.addressable_id = bt.id
INNER JOIN districts d ON d.id = a.district_id
WHERE bo.trip_start BETWEEN '2016-05-12' AND '2016-06-12'
GROUP BY d.name;

This returns incorrect data (The values are way higher than expected). However, when I get rid of all the joins and stop grouping by district - the numbers are correct (Counting the toal # of opportunities). Anybody that can spot what I am doing wrong? The most related question on here is this one.

Example data:

 District | won | lost | total 
----+---------+---------+------

  1 |       42 |    212 |   254

Expected data:

District | won | lost | total |
       ----+---------+---------+--
        1 |  22 |    155 |   177

Upvotes: 0

Views: 335

Answers (1)

Aaron Dietz
Aaron Dietz

Reputation: 10277

Formatted comment here:

I would venture a guess that one of your join conditions is at fault here, but with the provided structure it is impossible to say.

For instance, you have this join INNER JOIN pckgs p ON p.id = b.package_id, but package_id is not listed as a column in bks.

And these joins look especially suspect:

INNER JOIN pckgs p ON p.id = b.package_id
INNER JOIN bts bt ON bt.id = p.boat_id

If a boat can exist in multiple packages, it will be an issue.

To troubleshoot, start with the simplest query you can:

SELECT b.opportunity_id
FROM   bks b

Then leave the select alone, and proceed to add in each join:

SELECT b.opportunity_id
FROM   bks b
INNER JOIN pckgs p ON p.id = b.package_id

At some point you'll likely see a jump in the number of rows returned. Whichever JOIN you added last is your issue.

Upvotes: 2

Related Questions