Select count Distinct based on combination of two columns

Question

I have the below query that gets counts of distinct USERID's from the various tables and sums them to a grand total. I am expecting a total of 35 as the results, however I am only getting 30 as a result from this query. What it appears to be doing is when it finds the same USERID in more than one row in any table, it is counting them only once (It is fine that USERID's appear more than once in a table based on how it was structured).

I would like to get Distinct values based on the combination of USERID and EXAM_DT, as this combination will satisfy the uniqueness I need.

SQL:

SELECT 'TOTAL', '', COUNT (DISTINCT G.USERID) + COUNT (DISTINCT H.USERID) + 
COUNT (DISTINCT J.USERID) + COUNT (DISTINCT M.USERID) + COUNT (DISTINCT 
P.USERID) + COUNT(DISTINCT S.USERID) + COUNT (DISTINCT V.USERID) + COUNT ( 
DISTINCT Y.USERID) 

FROM PS_JOB F INNER JOIN PS_EMPLMT_SRCH_QRY F1 ON (F.USERID = 
F1.USERID AND F.EMPL_RCD = F1.EMPL_RCD ) 
LEFT OUTER JOIN  PS_GHS_HS_ANN_EXAM G ON  F.USERID = G.USERID AND G.EMPL_RCD 
 = F.EMPL_RCD  
LEFT OUTER JOIN  PS_GHS_HS_ANTINEO H ON  F.USERID = H.USERID AND H.EMPL_RCD 
 = F.EMPL_RCD  
LEFT OUTER JOIN  PS_GHS_HS_AUDIO J ON  F.USERID = J.USERID AND J.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_GHS_HS_DOT M ON  F.USERID = M.USERID AND M.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_GHS_HS_HAZMAT P ON  F.USERID = P.USERID AND P.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_GHS_HS_PREPLACE S ON  F.USERID = S.USERID AND S.EMPL_RCD 
 = F.EMPL_RCD  
LEFT OUTER JOIN  PS_GH_RESP_FIT V ON  F.USERID = V.USERID AND V.EMPL_RCD = 
 F.EMPL_RCD  
LEFT OUTER JOIN  PS_GHS_HS_ASBESTOS Y ON  F.USERID = Y.USERID AND Y.USERID = 
 F.USERID

WHERE ( ( F.EFFDT = 
    (SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
    WHERE F.USERID = F_ED.USERID 
      AND F.EMPL_RCD = F_ED.EMPL_RCD 
      AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
    (SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
    WHERE F.USERID = F_ES.USERID 
      AND F.EMPL_RCD = F_ES.EMPL_RCD 
      AND F.EFFDT = F_ES.EFFDT) ))

My results:

 (No column name)      (No column name)      (No column name)
  TOTAL                                       30

Here is an example from one of the tables in the query that contains the USERID 816455 twice, but only counting (in above query) one distinct occurrence of it (when I need the distinct to be based on the combination of USERID and EXAM_DT)

 USERID       USER_RCD       EXAM_DT       EXAM_TYPE_CD       EXPIRE_DT
 001          0              2018-04-17    ANN                2019-04-17
 03           0              2018-04-03    ANN                2019-04-27
 816455       0              2018-03-02    ANN                2018-03-31
 816455       0              2018-03-26    ANN                2018-06-30
 410908       0              2018-03-05    ANN                2019-05-30

I would like to avoid having to use subqueries to do the aggregation on the joins if possible as I need to add the sql to a tool that doesn't support that use. Any help is appreciated!

EDIT:

As LukStorms suggested I tried "Method 1" from his answer as follows:

 SELECT count (distinct concat(G.USERID, G.EXAM_DT)) 
 + count (distinct concat(H.USERID, H.EXAM_DT)) + count (distinct 
concat(J.USERID, J.EXAM_DT)) + count (distinct concat(M.USERID, M.EXAM_DT))
 + count (distinct concat(P.USERID, P.EXAM_DT)) + count (distinct 
concat(S.USERID, S.EXAM_DT)) + count (distinct concat(V.USERID, V.EXAM_DT)) 
 + count (distinct concat(Y.USERID, Y.EXAM_DT))       AS 'Total_Unique'
 FROM PS_JOB F

 LEFT OUTER JOIN  PS_GHS_HS_ANN_EXAM H ON  F.USERID = H.USERID AND 
  H.EMPL_RCD = F.EMPL_RCD
 LEFT OUTER JOIN  PS_GHS_HS_ANTINEO G ON  F.USERID = G.USERID AND G.EMPL_RCD 
  = F.EMPL_RCD
 LEFT OUTER JOIN  PS_GHS_HS_AUDIO J ON  F.USERID = J.USERID AND J.EMPL_RCD = 
  F.EMPL_RCD 
 LEFT OUTER JOIN  PS_GHS_HS_DOT M ON  F.USERID = M.USERID AND M.EMPL_RCD = 
  F.EMPL_RCD 
 LEFT OUTER JOIN  PS_GHS_HS_HAZMAT P ON  F.USERID = P.USERID AND P.EMPL_RCD 
  = F.EMPL_RCD  
 LEFT OUTER JOIN  PS_GHS_HS_PREPLACE S ON  F.USERID = S.USERID AND S 
  .EMPL_RCD = F.EMPL_RCD  
 LEFT OUTER JOIN  PS_GH_RESP_FIT V ON  F.USERID = V.USERID AND V.EMPL_RCD = 
  F.EMPL_RCD  
 LEFT OUTER JOIN  PS_GHS_HS_ASBESTOS Y ON  F.USERID = Y.USERID  

WHERE ( ( F.EFFDT = 
    (SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED 
    WHERE F.USERID = F_ED.USERID 
      AND F.EMPL_RCD = F_ED.EMPL_RCD 
      AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10)) 
AND F.EFFSEQ = 
    (SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES 
    WHERE F.USERID = F_ES.USERID 
      AND F.EMPL_RCD = F_ES.EMPL_RCD 
      AND F.EFFDT = F_ES.EFFDT) ))

From the above query I am getting a total count of 42, not 30. I looked at the data without the COUNT aggregation and it appears to retrieving a blank row in the tables, along with the concatenated data.

LukStorms · Accepted Answer

So you want to count distinct based on a combination of USERID and EXAM_DT?

But a count(distinct ...) only allows one field.

So then combine the 2 fields.
You can use concat for that.

Or the alternative. Group em on the date, then sum the totals.

Simplified example snippet:

declare @T table (id int identity(1,1) primary key, userid int, exam_dt datetime);

insert into @T (userid, exam_dt) values
(100, GETDATE()),(200, GETDATE()),(100, GETDATE()-1),(200, GETDATE()+0.001),(NULL,NULL);

select * from @T;

-- Method 1.1
select count(distinct concat(userid,'_',cast(exam_dt as date))) as total_unique from @T where userid is not null;

-- Method 1.2 : Adjustment because of the left joins. When there's no match then the values of the joined table would appear as NULL
select count(distinct nullif(concat(userid,'_',cast(exam_dt as date)),'_')) as total_unique from @T;

-- Method 2
select sum(total) as total_unique 
from(
    select count(distinct t.userid) as total 
    from @T t
    group by cast(t.exam_dt as date)
) q;

Returns 3.

Because userid 100 has 2 records with different dates, therefore counts as 2.
While userid 200 has 2 records with the same date, therefore counts as 1.

Simplified example snippet with joins:

declare @T  table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
declare @F1 table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
declare @F2 table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
insert into @T  (userid, exam_dt) values (100, GETDATE()),(200, GETDATE()),(100, GETDATE()-1),(200, GETDATE()),(300, GETDATE());
insert into @F1 (userid, exam_dt) values (100, GETDATE()),(200, GETDATE()),(200, GETDATE()+1);
insert into @F2 (userid, exam_dt) values (100, GETDATE()),(300, GETDATE()+1),(300, GETDATE()+2);

select (total0 + total1 + total2) as total, q.*
from (
    select 
    count(distinct nullif(concat(t0.userid,'_',t0.exam_dt),'_')) as total0,
    count(distinct nullif(concat(f1.userid,'_',f1.exam_dt),'_')) as total1,
    count(distinct nullif(concat(f2.userid,'_',f2.exam_dt),'_')) as total2
    from @T t0
    left join @F1 f1 on (f1.userid = t0.userid and f1.empl_rcd = t0.empl_rcd)
    left join @F2 f2 on (f2.userid = t0.userid and f2.empl_rcd = t0.empl_rcd)
) q;

Select count Distinct based on combination of two columns

Answers (1)

Related Questions