Reputation: 23
I am coding a dashboard, and I need to pull some data out of Microsoft SQL Server.
For a simple example, I have three tables, one master Category table, and two tables containing values linked to the Category table via a primary/foreign key relationship (Blue and Green value tables).
Using Microsoft SQL Sever (t-sql), I wish to total (sum) the values in the two value tables, grouped by the common category found in the category table.
Category Table
CategoryID (PK) | CategoryName
1 | Square
2 | Circle
Blue Table
BlueID (PK) | CategoryID (FK) | BlueValue | BlueMonth | BlueYear
1 | 1 | 10 | 6 | 2012
2 | 1 | 20 | 12 | 2012
3 | 2 | 5 | 6 | 2012
4 | 2 | 9 | 12 | 2012
5 | 1 | 12 | 6 | 2013
6 | 1 | 21 | 12 | 2013
7 | 2 | 4 | 6 | 2013
8 | 2 | 8 | 12 | 2013
Green Table
GreenID (PK)| CategoryID (FK) | GreenValue| GreenMonth| GreenYear
1 | 1 | 3 | 6 | 2012
2 | 1 | 6 | 12 | 2012
3 | 2 | 2 | 6 | 2012
4 | 2 | 7 | 12 | 2012
5 | 1 | 2 | 6 | 2013
6 | 1 | 5 | 12 | 2013
7 | 2 | 4 | 6 | 2013
8 | 2 | 8 | 12 | 2013
If I use the following SQL, I get the results I expect.
SELECT
[Category].[CategoryName],
SUM([Green].[GreenValue]) AS [GreenTotal]
FROM
[Category]
LEFT JOIN
[Green] ON [Category].[CategoryID] = [Green].[CategoryID]
GROUP BY
[Category].[CategoryName]
Results:
CategoryName | GreenTotal
Square | 16
Triangle | 21
However, if I add the Blue table, to try and fetch a total for BlueValue as well, my obviously incorrect T-SQL gives me unexpected results.
SELECT
[Category].[CategoryName],
SUM([Green].[GreenValue]) AS [GreenTotal],
SUM([Blue].[BlueValue]) AS [BlueTotal]
FROM
[Category]
LEFT JOIN
[Green] ON [Category].[CategoryID] = [Green].[CategoryID]
LEFT JOIN
[Blue] ON [Category].[CategoryID] = [Blue].[CategoryID]
GROUP BY
[Category].[CategoryName]
Incorrect Results:
CategoryName | GreenTotal | BlueTotal
Square | 64 | 252
Triangle | 84 | 104
The results all seem to be out by a factor of 4, which is the total number of rows in each value table for each category.
I am aiming to see the following results:
CategoryName | GreenTotal | BlueTotal
Square | 16 | 63
Triangle | 21 | 26
I would be over the moon if someone could tell me what on earth I am doing wrong?
Thanks, Mark.
Upvotes: 2
Views: 440
Reputation: 206
I also use CTE, find it easier on the eye - but rank the selects internal.
/*
create table Category ( CategoryId Integer, CategoryName nvarchar(50) )
create table Green ( CategoryId Integer, GreenValue Integer )
create table Blue ( CategoryId Integer, BlueValue Integer )
insert into Category VALUES (1,'Square'),(2,'Circle')
insert into Blue VALUES (1,10),(1,20),(2,5),(2,9),(1,12),(1,21),(2,4),(2,8)
insert into Green VALUES (1,3),(1,6),(2,2),(2,7),(1,2),(1,5),(2,4),(2,8)
*/
with CatSums(ColorRank, CategoryId, CategoryValue) as
(
select 1, CategoryId, GreenValue from Green
union all
select 2, CategoryId, BlueValue from Blue
)
select
C.CategoryName,
Sum(case when ColorRank = 1 then CategoryValue else 0 end) as GreenTotal,
Sum(case when ColorRank = 2 then CategoryValue else 0 end) as BlueTotal
from CatSums S left join Category C on C.CategoryId = S.CategoryId
group by C.CategoryName
Upvotes: 0
Reputation: 17171
What you're getting is a Cartesian product. You can see the effects of this by removing the grouping and looking through the data.
For example; if your green table contained 2 rows and your blue table contained 4, your join would return a total of 8 records.
To resolve the problem, well, you're nearly there. You've got all the right pieces, just not put them together quite right.
Assuming the following query returns the correct results for green:
SELECT CategoryID
, Sum(GreenValue) As GreenTotal
FROM Green
GROUP
BY CategoryID
The results for blue can be retrieved by following the same method:
SELECT CategoryID
, Sum(BueValue) As BlueTotal
FROM Blue
GROUP
BY CategoryID
Now that we have two distinct results that are correct, we should join these results to our category table:
SELECT Category.CategoryName
, GreenSummary.GreenTotal
, BlueSummary.BlueTotal
FROM Category
LEFT
JOIN (
SELECT CategoryID
, Sum(GreenValue) As GreenTotal
FROM Green
GROUP
BY CategoryID
) As GreenSummary
ON GreenSummary.CategoryID = Category.CategoryID
LEFT
JOIN (
SELECT CategoryID
, Sum(BlueValue) As BlueTotal
FROM Blue
GROUP
BY CategoryID
) As BlueSummary
ON BlueSummary.CategoryID = Category.CategoryID
Upvotes: 2
Reputation: 910
I would sum them up first with CTE. Then simply join the 2 together on what is common with only 1 occurrence in each, the CategoryName. This way you can't get a Cartesian product. I put the isnull statement in because there is a possibility that there are no results for a CategoryName in Blue or in Green. If you didn't do this you could get null for your CategoryName.
WITH GREENSUM as (
SELECT
[Category].[CategoryName],
SUM([Green].[GreenValue]) AS [GreenTotal]
FROM
[Category]
LEFT JOIN
[Green] ON [Category].[CategoryID] = [Green].[CategoryID]
GROUP BY
[Category].[CategoryName]
),
WITH BLUESUM as (
SELECT
[Category].[CategoryName],
SUM([Blue].[BlueValue]) AS [BlueTotal]
FROM
[Category]
LEFT JOIN
[Blue] ON [Category].[CategoryID] = [Blue].[CategoryID]
GROUP BY
[Category].[CategoryName])
SELECT isnull(GREENSUM.CategoryName, BLUESUM.CategoryName) as CategoryName,
GreenTotal, BlueTotal
FROM [GREENSUM]
FULL OUTER JOIN
[BLUESUM] ON [GREENSUM].CategoryName = [BLUESUM].CategoryName)
Upvotes: 0
Reputation: 1660
Something like this would be best done with an APPLY in my opinion. Fast performance-wise, simple to use, and easy to control in case of variations in the query.
IE:
SELECT C.[CategoryName], G.[GreenTotal], B.[BlueTotal]
FROM [Category] C
OUTER APPLY (SELECT SUM([GreenValue]) AS [GreenTotal] FROM [Green] WHERE [CategoryID] = C.CategoryID) G
OUTER APPLY (SELECT SUM([BlueValue]) AS [BlueTotal] FROM [Blue] WHERE [CategoryID] = C.CategoryID) B
Upvotes: 4
Reputation: 1382
Create a query for each total table. Group by category, create the sum column and add the column id.
Then use this querys as sub query and make a left outer join with the main table. This can give you the result expected and can have null values when the sum is not available. You can use the isnull function to convert the null values to 0.
Upvotes: 0