JBinson88
JBinson88

Reputation: 123

Why is my Count on joined tables not correct?

I have the following tables:

STUDYA (Alias J)

LOGINID   RCD_NUM         DATE         TYPE
 8745       0         04/15/2018      PRELIM
 6548       0         08/19/2017      PRELIM
 7445       0         10/02/2017      PRELIM
 2867       0         03/19/2018      PRELIM

TRIALS (Alias G)

LOGINID   RCD_NUM         DATE         TYPE
 8745       0          02/15/2017     ROLLUP
 7445       0          07/09/2016     ROLLUP
 2867       0          05/17/2017     ROLLUP
 2867       0          05/28/2017     ROLLUP
 5249       0          06/20/2017     ROLLUP
 1335       0          09/29/2017     ROLLUP
 9238       0          12/03/2017     ROLLUP

SPRINT (Alias H)

 LOGINID   RCD_NUM         DATE        TYPE
  5521       0          01/10/2018     SPRNT
  8745       1          04/04/2018     SPRNT
  3487       0          11/14/2017     SPRNT
  6627       0          05/05/2018     SPRNT

And another table called LOGIN that I am matching the LOGINID's on.

I have the following query running on SQL Server 2014:

SELECT 'COUNTS', COUNT(G.LOGINID), COUNT(H.LOGINID), COUNT(J.LOGINID)
 FROM LOGIN F 
LEFT OUTER JOIN  TRIALS G ON  F.LOGINID= G.LOGINID AND G.RCD_NUM = 
 F.RCD_NUM 
LEFT OUTER JOIN  SPRINT H ON  F.LOGINID= H.LOGINID AND H.RCD_NUM = 
 F.RCD_NUM
LEFT OUTER JOIN  STUDYA J ON  F.LOGINID= J.LOGINID AND J.RCD_NUM = 
 F.RCD_NUM  

WHERE ( ( F.EFFDT = 
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
WHERE F.LOGINID = F_ED.LOGINID 
  AND F.RCD_NUM = F_ED.RCD_NUM 
  AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
 WHERE F.LOGINID = F_ES.LOGINID 
  AND F.RCD_NUM = F_ES.RCD_NUM 
  AND F.EFFDT = F_ES.EFFDT) )

Here is the result of the above query:

(No column name)   (No column name)   (No column name)   (No column name) 
  COUNTS                  9                  5                  5

You can see the last column (COUNT(J.LOGINID)) is displaying 5 records, when in the actual table (above query) there is in fact only 4 records. I think it is because LOGINID 2867 has 2 rows in the TRIALS table, and 1 in STUDYA.

I've researched this and the recommendation seems to be doing the aggregation before the Joins. I'm struggling as to the best way of doing this though. Any help is appreciated!

Upvotes: 3

Views: 86

Answers (2)

MatBailie
MatBailie

Reputation: 86765

Depending on your data and use-case, you may get performance benefit from the approach you referred to: aggregate then join...

SELECT
 'COUNTS',
 COUNT(G.LOGINID) distinct_g, SUM(G.ROW_COUNT) row_count_g,
 COUNT(H.LOGINID) distinct_h, SUM(H.ROW_COUNT) row_count_h,
 COUNT(J.LOGINID) distinct_j, SUM(J.ROW_COUNT) row_count_j
FROM
 LOGIN F 
LEFT OUTER JOIN
(
 SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM TRIALS GROUP BY LOGINID, RCD_NUM
)
 G
  ON  F.LOGINID = G.LOGINID
  AND F.RCD_NUM = G.RCD_NUM 
LEFT OUTER JOIN
(
 SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM SPRINT GROUP BY LOGINID, RCD_NUM
)
 H
  ON  F.LOGINID = H.LOGINID
  AND F.RCD_NUM = H.RCD_NUM 
LEFT OUTER JOIN
(
 SELECT LOGINID, RCD_NUM, COUNT(*) AS ROW_COUNT FROM STUDYA GROUP BY LOGINID, RCD_NUM
)
 J
  ON  F.LOGINID = J.LOGINID
  AND F.RCD_NUM = J.RCD_NUM 

WHERE ( ( F.EFFDT = 
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
WHERE F.LOGINID = F_ED.LOGINID 
  AND F.RCD_NUM = F_ED.RCD_NUM 
  AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
 WHERE F.LOGINID = F_ES.LOGINID 
  AND F.RCD_NUM = F_ES.RCD_NUM 
  AND F.EFFDT = F_ES.EFFDT) )

The optimiser can use predicate-push-down / macro-like-expansion to ensure the aggregation is only done for relevant rows and preserve use of indexes.

This pattern also allows you to get other statistics, such as the min/max date for each login/rcd combination.

EDIT:

Another pattern could include...

SELECT
 'COUNTS',
 SUM(G.ROW_COUNT) row_count_g,
 SUM(H.ROW_COUNT) row_count_h,
 SUM(J.ROW_COUNT) row_count_j
FROM
 LOGIN F 
OUTER APPLY
(
 SELECT COUNT(*) AS ROW_COUNT FROM TRIALS WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
  G
OUTER APPLY
(
 SELECT COUNT(*) AS ROW_COUNT FROM SPRINT WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
  H
OUTER APPLY
(
 SELECT COUNT(*) AS ROW_COUNT FROM STUDYA WHERE LOGINID = F.LOGINID AND RCD_NUM = F.RCD_NUM
)
  J

WHERE ( ( F.EFFDT = 
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
WHERE F.LOGINID = F_ED.LOGINID 
  AND F.RCD_NUM = F_ED.RCD_NUM 
  AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
 WHERE F.LOGINID = F_ES.LOGINID 
  AND F.RCD_NUM = F_ES.RCD_NUM 
  AND F.EFFDT = F_ES.EFFDT) )

Upvotes: 2

Pelin
Pelin

Reputation: 966

You need to get distinct counts

 SELECT 'COUNTS', COUNT(distinct G.LOGINID), COUNT(distinct 
 H.LOGINID), COUNT(distinct J.LOGINID) ..

Upvotes: 1

Related Questions