Reputation: 123
I have the following tables:
STUDYA (Alias J)
LOGINID RCD_NUM DATE TYPE
8745 0 04/15/2018 PRELIM
6548 0 08/19/2017 PRELIM
7445 0 10/02/2017 PRELIM
2867 0 03/19/2018 PRELIM
TRIALS (Alias G)
LOGINID RCD_NUM DATE TYPE
8745 0 02/15/2017 ROLLUP
7445 0 07/09/2016 ROLLUP
2867 0 05/17/2017 ROLLUP
2867 0 05/28/2017 ROLLUP
5249 0 06/20/2017 ROLLUP
1335 0 09/29/2017 ROLLUP
9238 0 12/03/2017 ROLLUP
SPRINT (Alias H)
LOGINID RCD_NUM DATE TYPE
5521 0 01/10/2018 SPRNT
8745 1 04/04/2018 SPRNT
3487 0 11/14/2017 SPRNT
6627 0 05/05/2018 SPRNT
And another table called LOGIN that I am matching the LOGINID's on.
I have the following query running on SQL Server 2014:
SELECT 'COUNTS', COUNT(G.LOGINID), COUNT(H.LOGINID), COUNT(J.LOGINID)
FROM LOGIN F
LEFT OUTER JOIN TRIALS G ON F.LOGINID= G.LOGINID AND G.RCD_NUM =
F.RCD_NUM
LEFT OUTER JOIN SPRINT H ON F.LOGINID= H.LOGINID AND H.RCD_NUM =
F.RCD_NUM
LEFT OUTER JOIN STUDYA J ON F.LOGINID= J.LOGINID AND J.RCD_NUM =
F.RCD_NUM
WHERE ( ( F.EFFDT =
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED
WHERE F.LOGINID = F_ED.LOGINID
AND F.RCD_NUM = F_ED.RCD_NUM
AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))
AND F.EFFSEQ =
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES
WHERE F.LOGINID = F_ES.LOGINID
AND F.RCD_NUM = F_ES.RCD_NUM
AND F.EFFDT = F_ES.EFFDT) )
Here is the result of the above query:
(No column name) (No column name) (No column name) (No column name)
COUNTS 9 5 5
You can see the last column (COUNT(J.LOGINID)) is displaying 5 records, when in the actual table (above query) there is in fact only 4 records. I think it is because LOGINID 2867 has 2 rows in the TRIALS table, and 1 in STUDYA.
I've researched this and the recommendation seems to be doing the aggregation before the Joins. I'm struggling as to the best way of doing this though. Any help is appreciated!
Upvotes: 3
Views: 86
Reputation: 86765
Depending on your data and use-case, you may get performance benefit from the approach you referred to: aggregate then join...
SELECT
'COUNTS',
COUNT(G.LOGINID) distinct_g, SUM(G.ROW_COUNT) row_count_g,
COUNT(H.LOGINID) distinct_h, SUM(H.ROW_COUNT) row_count_h,
COUNT(J.LOGINID) distinct_j, SUM(J.ROW_COUNT) row_count_j
FROM
LOGIN F
LEFT OUTER JOIN
(
SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM TRIALS GROUP BY LOGINID, RCD_NUM
)
G
ON F.LOGINID = G.LOGINID
AND F.RCD_NUM = G.RCD_NUM
LEFT OUTER JOIN
(
SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM SPRINT GROUP BY LOGINID, RCD_NUM
)
H
ON F.LOGINID = H.LOGINID
AND F.RCD_NUM = H.RCD_NUM
LEFT OUTER JOIN
(
SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM STUDYA GROUP BY LOGINID, RCD_NUM
)
J
ON F.LOGINID = J.LOGINID
AND F.RCD_NUM = J.RCD_NUM
WHERE ( ( F.EFFDT =
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED
WHERE F.LOGINID = F_ED.LOGINID
AND F.RCD_NUM = F_ED.RCD_NUM
AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))
AND F.EFFSEQ =
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES
WHERE F.LOGINID = F_ES.LOGINID
AND F.RCD_NUM = F_ES.RCD_NUM
AND F.EFFDT = F_ES.EFFDT) )
The optimiser can use predicate-push-down / macro-like-expansion to ensure the aggregation is only done for relevant rows and preserve use of indexes.
This pattern also allows you to get other statistics, such as the min/max date for each login/rcd combination.
EDIT:
Another pattern could include...
SELECT
'COUNTS',
SUM(G.ROW_COUNT) row_count_g,
SUM(H.ROW_COUNT) row_count_h,
SUM(J.ROW_COUNT) row_count_j
FROM
LOGIN F
OUTER APPLY
(
SELECT COUNT(*) AS ROW_COUNT FROM TRIALS WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
G
OUTER APPLY
(
SELECT COUNT(*) AS ROW_COUNT FROM SPRINT WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
H
OUTER APPLY
(
SELECT COUNT(*) AS ROW_COUNT FROM STUDYA WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
J
WHERE ( ( F.EFFDT =
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED
WHERE F.LOGINID = F_ED.LOGINID
AND F.RCD_NUM = F_ED.RCD_NUM
AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))
AND F.EFFSEQ =
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES
WHERE F.LOGINID = F_ES.LOGINID
AND F.RCD_NUM = F_ES.RCD_NUM
AND F.EFFDT = F_ES.EFFDT) )
Upvotes: 2
Reputation: 966
You need to get distinct counts
SELECT 'COUNTS', COUNT(distinct G.LOGINID), COUNT(distinct
H.LOGINID), COUNT(distinct J.LOGINID) ..
Upvotes: 1