Reputation: 50318
Here is a simplified version of my problem. I have two tables. Each table has a unique ID field, but it's irrelevant in this case.
shipments
has 3 fields: shipment_id
, receive_by_datetime
, and qty
.
deliveries
has 4 fields: delivery_id
, shipment_id
, delivered_on_datetime
, and qty
.
In shipments
, the shipment_id
and receive_by_datetime
fields always match up. There are many rows in the table that would appear to be duplicates based off of those two columns (but they aren't... other fields are different).
In deliveries
, the shipment_id
matches up to the shipments
table. There are also many rows that would appear to be duplicates based off of the delivery_id
and delivered_on_datetime
fields (but they aren't again... other fields exist that I didn't list).
I am trying to pull one row per aggregate delivered_on_datetime
and receive_by_datetime
, but because of the many-to-many relationships, it's difficult. Is a query somewhere along these lines correct?
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) FROM deliveries d LEFT JOIN ( SELECT DISTINCT s1.shipment_id, s1.receive_by_datetime FROM shipments s1 ) s ON (s.shipment_id = d.shipment_id) GROUP BY d.delivered_on_datetime, s.receive_by_datetime
Upvotes: 0
Views: 3083
Reputation: 8703
You will run into problems where the total SUM(d.qty)
will be larger than the value from SELECT SUM(qty) FROM deliveries
Something like this might be better suited for you:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, s1.receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id, s1.received_by_datetime
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
If you somehow have (or might have) a shipment_id
that has multiple values for received_by_datetime
and it's best practice to assume that something else might have corrupted the data slightly then to prevent the lines in the deliveries
table being duplicated while still returning a valid result you can use:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, MAX(s1.receive_by_datetime) AS receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
Upvotes: 2
Reputation: 8598
Yep, the problem with many-to-many is you get the cartesian product of rows, so you end up counting the same row more than once. Once for each other row it matches against.
In shipments, the shipment_id and receive_by_datetime fields always match up
If this means there cannot be two shipments with the same ID but different dates then your query will work. But in general it is not safe. i.e. If subselect distinct could return more than one row per shipment ID, you will be subject to the double counting issue. Generically this is a very tricky problem to solve - in fact I see no way it could be with this data model.
Upvotes: 1