Reputation: 305
I have 4 tables in a database. The warehouse contains boxes owned by clients, and the boxes have files in them. There is a Client
table, a Warehouse
table, a Boxes
table, and a Files
table.
So the Client
table has WarehouseID
as a foreign key, the Boxes
table has ClientID
as a foreign key, and the Files
table has BoxID
as a foreign key. I want to count the number of boxes and files that each client has in my query, as well as the number of boxes that are in and out of the warehouse. A Status
field on the Boxes
and Files
tables determines if the boxes and files are in or out of the warehouse. I run the following query on the boxes and the numbers are correct:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].WarehouseID = [w].[WarehouseID]
WHERE [c].[ClientID] = @ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This produces the output of:
ClientID | ClientName | WarehouseName | BoxCount | BoxesIn | BoxesOut | BoxesForDestruction
1 | ACME Corp. | FooFactory | 22744 | 22699 | 45 | 7888
The output of the count is correct. When I add the Files
table to the INNER JOIN
then the numbers get inflated. Here is the SQL:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
COUNT(DISTINCT [f].[FileID]) AS [FileCount], -- *NEW*
SUM(CASE WHEN [b].[Status] = @IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = @OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].[WarehouseID] = [w].[WarehouseID]
INNER JOIN [Files] AS [f] -- *NEW*
ON [b].[BoxID] = [f].[BoxID] -- *NEW*
WHERE [c].[ClientID] = @ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This gives me the count output below (I've omitted the first 3 columns since they're not relevant):
BoxCount | FilesCount | BoxesIn | BoxesOut | BoxesForDestruction
19151 | 411961 | 411381 | 580 | 144615
The FilesCount
is correct, but the other numbers are off. I know why this is happening, but I'm not sure how to fix it. The extra rows are created due to the multiple rows returned by the join on the boxes and files. When performing the SUM
, the extra rows inflate the count. Since there is only one row for the warehouse, that join doesn't affect the count. How do I modify my query to get the correct number of files and boxes in and out of the warehouse?
Upvotes: 0
Views: 5448
Reputation: 11
As mentioned by Andomar, I included "as myColumnOne" and "myColumnTwo" besides Count(*), as it is required on SQL Server 2018:
select *
from table1 t1
join (
select table1_id
, count(*) as myColumnOne
from table2
group by
table1_id
) t2
on t2.table1_id = t1.id
join (
select table1_id
, count(*) as myColumnTwo
from table3
group by
table1_id
) t3
on t3.table1_id = t1.id
Upvotes: 0
Reputation: 238096
A join
repeats each row in the left hand table for each row in the right hand table. If you combine multiple joins some rows will be double counted. A solution is to move the count to a subquery. For example:
select *
from table1 t1
join (
select table1_id
, count(*)
from table2
group by
table1_id
) t2
on t2.table1_id = t1.id
join (
select table1_id
, count(*)
from table3
group by
table1_id
) t3
on t3.table1_id = t1.id
Upvotes: 2