Reputation: 372
I have the below query that gets counts of distinct USERID's from the various tables and sums them to a grand total. I am expecting a total of 35 as the results, however I am only getting 30 as a result from this query. What it appears to be doing is when it finds the same USERID in more than one row in any table, it is counting them only once (It is fine that USERID's appear more than once in a table based on how it was structured).
I would like to get Distinct values based on the combination of USERID and EXAM_DT, as this combination will satisfy the uniqueness I need.
SQL:
SELECT 'TOTAL', '', COUNT (DISTINCT G.USERID) + COUNT (DISTINCT H.USERID) +
COUNT (DISTINCT J.USERID) + COUNT (DISTINCT M.USERID) + COUNT (DISTINCT
P.USERID) + COUNT(DISTINCT S.USERID) + COUNT (DISTINCT V.USERID) + COUNT (
DISTINCT Y.USERID)
FROM PS_JOB F INNER JOIN PS_EMPLMT_SRCH_QRY F1 ON (F.USERID =
F1.USERID AND F.EMPL_RCD = F1.EMPL_RCD )
LEFT OUTER JOIN PS_GHS_HS_ANN_EXAM G ON F.USERID = G.USERID AND G.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ANTINEO H ON F.USERID = H.USERID AND H.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_AUDIO J ON F.USERID = J.USERID AND J.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_DOT M ON F.USERID = M.USERID AND M.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_HAZMAT P ON F.USERID = P.USERID AND P.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_PREPLACE S ON F.USERID = S.USERID AND S.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GH_RESP_FIT V ON F.USERID = V.USERID AND V.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ASBESTOS Y ON F.USERID = Y.USERID AND Y.USERID =
F.USERID
WHERE ( ( F.EFFDT =
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED
WHERE F.USERID = F_ED.USERID
AND F.EMPL_RCD = F_ED.EMPL_RCD
AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))
AND F.EFFSEQ =
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES
WHERE F.USERID = F_ES.USERID
AND F.EMPL_RCD = F_ES.EMPL_RCD
AND F.EFFDT = F_ES.EFFDT) ))
My results:
(No column name) (No column name) (No column name)
TOTAL 30
Here is an example from one of the tables in the query that contains the USERID 816455 twice, but only counting (in above query) one distinct occurrence of it (when I need the distinct to be based on the combination of USERID and EXAM_DT)
USERID USER_RCD EXAM_DT EXAM_TYPE_CD EXPIRE_DT
001 0 2018-04-17 ANN 2019-04-17
03 0 2018-04-03 ANN 2019-04-27
816455 0 2018-03-02 ANN 2018-03-31
816455 0 2018-03-26 ANN 2018-06-30
410908 0 2018-03-05 ANN 2019-05-30
I would like to avoid having to use subqueries to do the aggregation on the joins if possible as I need to add the sql to a tool that doesn't support that use. Any help is appreciated!
EDIT:
As LukStorms suggested I tried "Method 1" from his answer as follows:
SELECT count (distinct concat(G.USERID, G.EXAM_DT))
+ count (distinct concat(H.USERID, H.EXAM_DT)) + count (distinct
concat(J.USERID, J.EXAM_DT)) + count (distinct concat(M.USERID, M.EXAM_DT))
+ count (distinct concat(P.USERID, P.EXAM_DT)) + count (distinct
concat(S.USERID, S.EXAM_DT)) + count (distinct concat(V.USERID, V.EXAM_DT))
+ count (distinct concat(Y.USERID, Y.EXAM_DT)) AS 'Total_Unique'
FROM PS_JOB F
LEFT OUTER JOIN PS_GHS_HS_ANN_EXAM H ON F.USERID = H.USERID AND
H.EMPL_RCD = F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ANTINEO G ON F.USERID = G.USERID AND G.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_AUDIO J ON F.USERID = J.USERID AND J.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_DOT M ON F.USERID = M.USERID AND M.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_HAZMAT P ON F.USERID = P.USERID AND P.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_PREPLACE S ON F.USERID = S.USERID AND S
.EMPL_RCD = F.EMPL_RCD
LEFT OUTER JOIN PS_GH_RESP_FIT V ON F.USERID = V.USERID AND V.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ASBESTOS Y ON F.USERID = Y.USERID
WHERE ( ( F.EFFDT =
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED
WHERE F.USERID = F_ED.USERID
AND F.EMPL_RCD = F_ED.EMPL_RCD
AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))
AND F.EFFSEQ =
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES
WHERE F.USERID = F_ES.USERID
AND F.EMPL_RCD = F_ES.EMPL_RCD
AND F.EFFDT = F_ES.EFFDT) ))
From the above query I am getting a total count of 42, not 30. I looked at the data without the COUNT aggregation and it appears to retrieving a blank row in the tables, along with the concatenated data.
Upvotes: 0
Views: 1819
Reputation: 29647
So you want to count distinct based on a combination of USERID and EXAM_DT?
But a count(distinct ...)
only allows one field.
So then combine the 2 fields.
You can use concat for that.
Or the alternative. Group em on the date, then sum the totals.
Simplified example snippet:
declare @T table (id int identity(1,1) primary key, userid int, exam_dt datetime);
insert into @T (userid, exam_dt) values
(100, GETDATE()),(200, GETDATE()),(100, GETDATE()-1),(200, GETDATE()+0.001),(NULL,NULL);
select * from @T;
-- Method 1.1
select count(distinct concat(userid,'_',cast(exam_dt as date))) as total_unique from @T where userid is not null;
-- Method 1.2 : Adjustment because of the left joins. When there's no match then the values of the joined table would appear as NULL
select count(distinct nullif(concat(userid,'_',cast(exam_dt as date)),'_')) as total_unique from @T;
-- Method 2
select sum(total) as total_unique
from(
select count(distinct t.userid) as total
from @T t
group by cast(t.exam_dt as date)
) q;
Returns 3.
Because userid 100 has 2 records with different dates, therefore counts as 2.
While userid 200 has 2 records with the same date, therefore counts as 1.
Simplified example snippet with joins:
declare @T table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
declare @F1 table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
declare @F2 table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
insert into @T (userid, exam_dt) values (100, GETDATE()),(200, GETDATE()),(100, GETDATE()-1),(200, GETDATE()),(300, GETDATE());
insert into @F1 (userid, exam_dt) values (100, GETDATE()),(200, GETDATE()),(200, GETDATE()+1);
insert into @F2 (userid, exam_dt) values (100, GETDATE()),(300, GETDATE()+1),(300, GETDATE()+2);
select (total0 + total1 + total2) as total, q.*
from (
select
count(distinct nullif(concat(t0.userid,'_',t0.exam_dt),'_')) as total0,
count(distinct nullif(concat(f1.userid,'_',f1.exam_dt),'_')) as total1,
count(distinct nullif(concat(f2.userid,'_',f2.exam_dt),'_')) as total2
from @T t0
left join @F1 f1 on (f1.userid = t0.userid and f1.empl_rcd = t0.empl_rcd)
left join @F2 f2 on (f2.userid = t0.userid and f2.empl_rcd = t0.empl_rcd)
) q;
Upvotes: 2