Olaru Mircea
Olaru Mircea

Reputation: 2620

Multiple joins with aggregates

I have the two following tables:

Person:

EntityId    FirstName          LastName
----------- ------------------ -----------------
1           Ion                Ionel
2           Fane               Fanel
3           George             Georgel
4           Mircea             Mircel

SalesQuotaHistory

SalesQuotaId EntityId    SalesQuota  SalesOrderDate
------------ ----------- ----------- -----------------------
1            1           1000        2014-01-01 00:00:00.000
2            1           1000        2014-01-02 00:00:00.000
3            1           1000        2014-01-03 00:00:00.000
4            3           3000        2013-01-01 00:00:00.000
5            3           3000        2013-01-01 00:00:00.000
7            4           4000        2015-01-01 00:00:00.000
8            4           4000        2015-01-02 00:00:00.000
9            4           4000        2015-01-03 00:00:00.000
10           1           1000        2015-01-01 00:00:00.000
11           1           1000        2015-01-02 00:00:00.000

I am trying to get the SalesQuota for each user in 2014 and 2015.

Using this query i am getting an erroneous result:

SELECT p.EntityId
  , p.FirstName
  , SUM(sqh2014.SalesQuota) AS '2014'
  , SUM(sqh2015.SalesQuota) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh2014
   ON p.EntityId = sqh2014.EntityId
    AND YEAR(sqh2014.SalesOrderDate) = 2014
LEFT OUTER JOIN SalesQuotaHistory sqh2015
   ON p.EntityId = sqh2015.EntityId
    AND YEAR(sqh2015.SalesOrderDate) = 2015
    GROUP BY p.EntityId, p.FirstName

   EntityId    FirstName      2014        2015
   ---------   -----------   ----------   --------------------
    1          Ion           6000         6000
    2          Fane          NULL         NULL
    3          George        NULL         NULL
    4          Mircea        NULL         12000

In fact, Id 1 has a total SalesQuota of 3000 in 2014 and 2000 in 2015.

What i am asking here, is .. what is really happening behind the scenes? What is the order of operation in this specific case?

Thanks to my last post i was able to solve this using the following query:

SELECT p.EntityId
   , p.FirstName
   , SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014 THEN sqh.SalesQuota ELSE 0 END) AS '2014'
   , SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015 THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
   ON p.EntityId = sqh.EntityId
   GROUP BY p.EntityId, p.FirstName


EntityId    FirstName             2014        2015
----------- --------------------- ----------- -----------
1           Ion                   3000        2000
2           Fane                  0           0
3           George                0           0
4           Mircea                0           12000

but without understanding what's wrong with the first attempt .. i can't get over this ..

Any explanation would be greatly appreciated.

Upvotes: 0

Views: 68

Answers (3)

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48197

Is easy to see what is happening if you change your select to

SELECT *

and remove the group by

You first approach need something like this

Sql Fiddle Demo

SELECT p.[EntityId]
  , p.FirstName
  , COALESCE(s2014,0) as [2014]
  , COALESCE(s2015,0) as [2015]
FROM Person p
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2014
           FROM SalesQuotaHistory 
           WHERE YEAR(SalesOrderDate) = 2014
           GROUP BY EntityId
          ) as s1
      ON p.[EntityId] = s1.EntityId
LEFT JOIN (SELECT EntityId, SUM(SalesQuota) s2015
           FROM SalesQuotaHistory 
           WHERE YEAR(SalesOrderDate) = 2015
           GROUP BY EntityId
          ) as s2
      ON p.[EntityId] = s2.EntityId

Joining with the result data only if exist for that id and year.

OUTPUT

| EntityId | FirstName | 2014 |  2015 |
|----------|-----------|------|-------|
|        1 |       Ion | 3000 |  2000 |
|        2 |      Fane |    0 |     0 |
|        3 |    George |    0 |     0 |
|        4 |    Mircea |    0 | 12000 |

Upvotes: 2

Radu Gheorghiu
Radu Gheorghiu

Reputation: 20489

You could improve the speed of your query by adding a WHERE condition to filter only the years over which you're looking for data:

SELECT p.EntityId
   , p.FirstName
   , SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2014
         THEN sqh.SalesQuota ELSE 0 END) AS '2014'
   , SUM(CASE WHEN YEAR(sqh.SalesOrderDate) = 2015 
         THEN sqh.SalesQuota ELSE 0 END) AS '2015'
FROM Person p
LEFT OUTER JOIN SalesQuotaHistory sqh
   ON p.EntityId = sqh.EntityId
WHERE YEAR(sqh.SalesOrderDate) IN (2014, 2015)
GROUP BY p.EntityId, p.FirstName

Otherwise, the query that you found is the way to go (good job!)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You have multiple rows for each year, so the first method is producing a Cartesian product.

For instance, consider EntityId 100:

1            1           1000        2014-01-01 00:00:00.000
2            1           1000        2014-01-02 00:00:00.000
3            1           1000        2014-01-03 00:00:00.000
10           1           1000        2015-01-01 00:00:00.000
11           1           1000        2015-01-02 00:00:00.000

The intermediate result from the join produces six rows, with these SalesQuotaId:

1     10
1     11
2     10
2     11
3     10
3     11

You can then do the math -- the result is off because of the multiple rows.

You seem to know how to fix the problem. The conditional aggregation approach produces the correct answer.

Upvotes: 1

Related Questions