SQL COUNT on 3 tables with JOIN

Question

I have 4 tables in a database. The warehouse contains boxes owned by clients, and the boxes have files in them. There is a Client table, a Warehouse table, a Boxes table, and a Files table.

So the Client table has WarehouseID as a foreign key, the Boxes table has ClientID as a foreign key, and the Files table has BoxID as a foreign key. I want to count the number of boxes and files that each client has in my query, as well as the number of boxes that are in and out of the warehouse. A Status field on the Boxes and Files tables determines if the boxes and files are in or out of the warehouse. I run the following query on the boxes and the numbers are correct:

SELECT
    [c].[ClientID],
    [c].[Name] AS [ClientName],
    [w].[Name] AS [WarehouseName],
    COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
    SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
    SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
    SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
  ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
  ON [c].WarehouseID = [w].[WarehouseID]
WHERE [c].[ClientID] = @ClientID
GROUP BY
  [c].[ClientID],
  [c].[Name],
  [w].[Name]

This produces the output of:

ClientID | ClientName | WarehouseName | BoxCount | BoxesIn | BoxesOut | BoxesForDestruction
1        | ACME Corp. | FooFactory    | 22744    | 22699   | 45       | 7888

The output of the count is correct. When I add the Files table to the INNER JOIN then the numbers get inflated. Here is the SQL:

SELECT
    [c].[ClientID],
    [c].[Name] AS [ClientName],
    [w].[Name] AS [WarehouseName],
    COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
    COUNT(DISTINCT [f].[FileID]) AS [FileCount], -- *NEW*
    SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
    SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
    SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
  ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
  ON [c].[WarehouseID] = [w].[WarehouseID]
INNER JOIN [Files] AS [f]      -- *NEW*
  ON [b].[BoxID] = [f].[BoxID] -- *NEW*
WHERE [c].[ClientID] = @ClientID
GROUP BY
  [c].[ClientID],
  [c].[Name],
  [w].[Name]

This gives me the count output below (I've omitted the first 3 columns since they're not relevant):

BoxCount | FilesCount | BoxesIn | BoxesOut | BoxesForDestruction 
19151    | 411961     | 411381  | 580      | 144615

The FilesCount is correct, but the other numbers are off. I know why this is happening, but I'm not sure how to fix it. The extra rows are created due to the multiple rows returned by the join on the boxes and files. When performing the SUM, the extra rows inflate the count. Since there is only one row for the warehouse, that join doesn't affect the count. How do I modify my query to get the correct number of files and boxes in and out of the warehouse?

Andomar · Accepted Answer

A join repeats each row in the left hand table for each row in the right hand table. If you combine multiple joins some rows will be double counted. A solution is to move the count to a subquery. For example:

select  *
from    table1 t1
join    (
        select  table1_id
        ,       count(*)
        from    table2
        group by
                table1_id
        ) t2
on      t2.table1_id = t1.id
join    (
        select  table1_id
        ,       count(*)
        from    table3
        group by
                table1_id
        ) t3
on      t3.table1_id = t1.id

SQL COUNT on 3 tables with JOIN

Answers (2)

Related Questions