Reputation: 816
I am trying to extract some data grouped by the markets we operate in. The table structure looks like this:
bks:
opportunity_id
bks_opps:
opportunity_id | trip_start | state
bts:
boat_id | package_id
pckgs:
package_id | boat_id
addresses:
addressable_id | district_id
districts:
district_id
What I wanted to do is to count the number of won, lost and total and percentage won for each district.
SELECT d.name AS "District",
SUM(CASE WHEN bo.state IN ('won') THEN 1 ELSE 0 END) AS "Won",
SUM(CASE WHEN bo.state IN ('lost') THEN 1 ELSE 0 END) AS "Lost",
Count(bo.state) AS "Total",
Round(100 * SUM(CASE WHEN bo.state IN ('won') THEN 1 ELSE 0 END) / Count(bo.state)) AS "% Won"
FROM bks b
INNER JOIN bks_opps bo ON bo.id = b.opportunity_id
INNER JOIN pckgs p ON p.id = b.package_id
INNER JOIN bts bt ON bt.id = p.boat_id
INNER JOIN addresses a ON a.addressable_type = 'Boat' AND a.addressable_id = bt.id
INNER JOIN districts d ON d.id = a.district_id
WHERE bo.trip_start BETWEEN '2016-05-12' AND '2016-06-12'
GROUP BY d.name;
This returns incorrect data (The values are way higher than expected). However, when I get rid of all the joins and stop grouping by district - the numbers are correct (Counting the toal # of opportunities). Anybody that can spot what I am doing wrong? The most related question on here is this one.
Example data:
District | won | lost | total
----+---------+---------+------
1 | 42 | 212 | 254
Expected data:
District | won | lost | total |
----+---------+---------+--
1 | 22 | 155 | 177
Upvotes: 0
Views: 335
Reputation: 10277
Formatted comment here:
I would venture a guess that one of your join conditions is at fault here, but with the provided structure it is impossible to say.
For instance, you have this join INNER JOIN pckgs p ON p.id = b.package_id
, but package_id
is not listed as a column in bks
.
And these joins look especially suspect:
INNER JOIN pckgs p ON p.id = b.package_id
INNER JOIN bts bt ON bt.id = p.boat_id
If a boat can exist in multiple packages, it will be an issue.
To troubleshoot, start with the simplest query you can:
SELECT b.opportunity_id
FROM bks b
Then leave the select alone, and proceed to add in each join:
SELECT b.opportunity_id
FROM bks b
INNER JOIN pckgs p ON p.id = b.package_id
At some point you'll likely see a jump in the number of rows returned. Whichever JOIN
you added last is your issue.
Upvotes: 2