Reputation: 2620
I have the two following tables:
Person:
EntityId FirstName LastName
----------- ------------------ -----------------
1 Ion Ionel
2 Fane Fanel
3 George Georgel
4 Mircea Mircel
SalesQuotaHistory
SalesQuotaId EntityId SalesQuota SalesOrderDate
------------ ----------- ----------- -----------------------
1 1 1000 2014-01-01 00:00:00.000
2 1 1000 2014-01-02 00:00:00.000
3 1 1000 2014-01-03 00:00:00.000
4 3 3000 2013-01-01 00:00:00.000
5 3 3000 2013-01-01 00:00:00.000
7 4 4000 2015-01-01 00:00:00.000
8 4 4000 2015-01-02 00:00:00.000
9 4 4000 2015-01-03 00:00:00.000
10 1 1000 2015-01-01 00:00:00.000
11 1 1000 2015-01-02 00:00:00.000
I am trying to get the SalesQuota for each user in 2014 and 2015.
Using this query i am getting an erroneous result:
SELECT p.EntityId
, p.FirstName
, SUM(sqh2014.SalesQuota) AS '2014'
, SUM(sqh2015.SalesQuota) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh2014
ON p.EntityId = sqh2014.EntityId
AND YEAR(sqh2014.SalesOrderDate) = 2014
LEFT OUTER JOIN SalesQuotaHistory sqh2015
ON p.EntityId = sqh2015.EntityId
AND YEAR(sqh2015.SalesOrderDate) = 2015
GROUP BY p.EntityId, p.FirstName
EntityId FirstName 2014 2015
--------- ----------- ---------- --------------------
1 Ion 6000 6000
2 Fane NULL NULL
3 George NULL NULL
4 Mircea NULL 12000
In fact, Id 1 has a total SalesQuota of 3000 in 2014 and 2000 in 2015.
What i am asking here, is .. what is really happening behind the scenes? What is the order of operation in this specific case?
Thanks to my last post i was able to solve this using the following query:
SELECT p.EntityId
, p.FirstName
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014 THEN sqh.SalesQuota ELSE 0 END) AS '2014'
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015 THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
ON p.EntityId = sqh.EntityId
GROUP BY p.EntityId, p.FirstName
EntityId FirstName 2014 2015
----------- --------------------- ----------- -----------
1 Ion 3000 2000
2 Fane 0 0
3 George 0 0
4 Mircea 0 12000
but without understanding what's wrong with the first attempt .. i can't get over this ..
Any explanation would be greatly appreciated.
Upvotes: 0
Views: 68
Reputation: 48197
Is easy to see what is happening if you change your select to
SELECT *
and remove the group by
You first approach need something like this
SELECT p.[EntityId]
, p.FirstName
, COALESCE(s2014,0) as [2014]
, COALESCE(s2015,0) as [2015]
FROM Person p
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2014
FROM SalesQuotaHistory
WHERE YEAR(SalesOrderDate) = 2014
GROUP BY EntityId
) as s1
ON p.[EntityId] = s1.EntityId
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2015
FROM SalesQuotaHistory
WHERE YEAR(SalesOrderDate) = 2015
GROUP BY EntityId
) as s2
ON p.[EntityId] = s2.EntityId
Joining with the result data only if exist for that id
and year
.
OUTPUT
| EntityId | FirstName | 2014 | 2015 |
|----------|-----------|------|-------|
| 1 | Ion | 3000 | 2000 |
| 2 | Fane | 0 | 0 |
| 3 | George | 0 | 0 |
| 4 | Mircea | 0 | 12000 |
Upvotes: 2
Reputation: 20489
You could improve the speed of your query by adding a WHERE condition to filter only the years over which you're looking for data:
SELECT p.EntityId
, p.FirstName
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014
THEN sqh.SalesQuota ELSE 0 END) AS '2014'
, SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015
THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
ON p.EntityId = sqh.EntityId
WHERE YEAR(sqh.SalesOrderDate) IN (2014, 2015)
GROUP BY p.EntityId, p.FirstName
Otherwise, the query that you found is the way to go (good job!)
Upvotes: 1
Reputation: 1269773
You have multiple rows for each year, so the first method is producing a Cartesian product.
For instance, consider EntityId
100:
1 1 1000 2014-01-01 00:00:00.000
2 1 1000 2014-01-02 00:00:00.000
3 1 1000 2014-01-03 00:00:00.000
10 1 1000 2015-01-01 00:00:00.000
11 1 1000 2015-01-02 00:00:00.000
The intermediate result from the join produces six rows, with these SalesQuotaId
:
1 10
1 11
2 10
2 11
3 10
3 11
You can then do the math -- the result is off because of the multiple rows.
You seem to know how to fix the problem. The conditional aggregation approach produces the correct answer.
Upvotes: 1