James Skidmore
James Skidmore

Reputation: 50318

How to join table in many-to-many relationship?

Here is a simplified version of my problem. I have two tables. Each table has a unique ID field, but it's irrelevant in this case.

shipments has 3 fields: shipment_id, receive_by_datetime, and qty.

deliveries has 4 fields: delivery_id, shipment_id, delivered_on_datetime, and qty.

In shipments, the shipment_id and receive_by_datetime fields always match up. There are many rows in the table that would appear to be duplicates based off of those two columns (but they aren't... other fields are different).

In deliveries, the shipment_id matches up to the shipments table. There are also many rows that would appear to be duplicates based off of the delivery_id and delivered_on_datetime fields (but they aren't again... other fields exist that I didn't list).

I am trying to pull one row per aggregate delivered_on_datetime and receive_by_datetime, but because of the many-to-many relationships, it's difficult. Is a query somewhere along these lines correct?

SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty)
FROM deliveries d
LEFT JOIN (
  SELECT DISTINCT s1.shipment_id, s1.receive_by_datetime
  FROM shipments s1
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime

Upvotes: 0

Views: 3083

Answers (2)

Seph
Seph

Reputation: 8703

You will run into problems where the total SUM(d.qty) will be larger than the value from SELECT SUM(qty) FROM deliveries

Something like this might be better suited for you:

SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
  SELECT s1.shipment_id, s1.receive_by_datetime, SUM(s1.qty) AS qty
  FROM shipments s1
  GROUP BY s1.shipment_id, s1.received_by_datetime
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime

If you somehow have (or might have) a shipment_id that has multiple values for received_by_datetime and it's best practice to assume that something else might have corrupted the data slightly then to prevent the lines in the deliveries table being duplicated while still returning a valid result you can use:

SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
  SELECT s1.shipment_id, MAX(s1.receive_by_datetime) AS receive_by_datetime, SUM(s1.qty) AS qty
  FROM shipments s1
  GROUP BY s1.shipment_id
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime

Upvotes: 2

Sodved
Sodved

Reputation: 8598

Yep, the problem with many-to-many is you get the cartesian product of rows, so you end up counting the same row more than once. Once for each other row it matches against.

In shipments, the shipment_id and receive_by_datetime fields always match up

If this means there cannot be two shipments with the same ID but different dates then your query will work. But in general it is not safe. i.e. If subselect distinct could return more than one row per shipment ID, you will be subject to the double counting issue. Generically this is a very tricky problem to solve - in fact I see no way it could be with this data model.

Upvotes: 1

Related Questions