Reputation: 305
Here's the scenario: In a SQL Server 2008 R2 database, Table A has StudyID, VisitCode and VisitSequenceNumber. Table B has StudyID, SubjectID and VisitCode.
Table A has all possible VisitCodes for a given StudyID (There are over 200 StudyIDs in this table, each with its own set of VisitCodes).
Table B has all SubjectIDs for a given StudyID, as well as all VisitIDs for each subject (There are over 200 StudyIDs in this table, each with its own set of SubjectIDs).
I need to create a list of all SubjectIDs for a StudyID, aling with all the VisitIDs that the SubjectID does NOT have. I can't just make a list of SubjectIDs with missing visits, I need to identify which visits are missing for each subject.
So if for StudyID 'C1234' Table A has this:
StudyID | VisitCode | VisitSequenceNumber C1234 | V100A | 100 C1234 | V110A | 110 C1234 | V120A | 120 C1234 | UNS | 999
And Table B has:
StudyID |SubjectID | VisitCode C1234 | 01-001 | V100A C1234 | 01-001 | V120A C1234 | 01-001 | UNS C1234 | 01-002 | V110A C1234 | 01-002 | UNS
I need to return rows that would contain the following:
StudyID | SubjectID | VisitCode C1234 | 01-001 | V110A C1234 | 01-002 | V100A C1234 | 01-002 | V120A
For some reason I just can't seem to work this one out - how do I match up the Subjects with the missing Visits when they are by definition not in Table B for those Subjects? Any help will be much appreciated!
Upvotes: 1
Views: 86
Reputation: 28196
You have to introduce some kind of master list for the required subjectID
s and visitCodes
.
Edit
I get all possible values for studyID
from tableB (here: limited to 'C1234'), all values for subjectID
from tableB again and all values for visitCodes
from tableA. After that I run a join between all these possible values and check whether the combination already exists in tableB
SELECT sti studyId, sid studyId, vc visitCode
FROM
( SELECT DISTINCT studyID sti FROM tableB ) stis
INNER JOIN
( SELECT DISTINCT StudyID ssi,subjectID sid FROM tableB) s ON ssi=sti
INNER JOIN
( SELECT DISTINCT StudyID vsi,visitCode vc FROM tableA ) v ON vsi=sti
WHERE NOT EXISTS (SELECT 1 FROM tableB
WHERE StudyID=sti AND subjectID=sid AND visitCode=vc)
AND sti = 'C1234' -- to limit the example to the current study
See here for a working demo (MySQL): http://sqlfiddle.com/#!9/cc91b/11
or here (T-SQL 2014): demo on data.stackexchange
Upvotes: 1
Reputation: 44881
One way is to use cross join
in a derived table to generate all possible combinations of StudyID, VisitCode and SubjectId and then left join with that set and filter for null to find the missing rows:
select all_combo.studyid, all_combo.subjectid, all_combo.visitcode
from (
select a.studyid, a.visitcode, b.subjectid
from tableb b
cross join tablea a
group by a.studyid, a.visitcode, b.subjectid
) all_combo
left join tableb b
on all_combo.VisitCode = b.VisitCode
and all_combo.StudyID = b.StudyID
and all_combo.SubjectID = b.SubjectID
where b.StudyID is null
and all_combo.StudyID = 'C1234' -- you might have to limit to the specific StudyID
order by all_combo.SubjectID;
Result with your sample data:
| studyid | subjectid | visitcode |
|---------|-----------|-----------|
| C1234 | 01-001 | V110A |
| C1234 | 01-002 | V100A |
| C1234 | 01-002 | V120A |
Upvotes: 4
Reputation: 52645
One of the things that makes this difficult is you don't have a table that defindes which subjects are associated with a studyid so we have to derive it from b. Using a CTE will do that. Also the JOIN to tableB is wierd because it uses the distinct set of {StudyId, subjectID} as well as the possible visits
WITH subjects
AS (SELECT DISTINCT studyid,
subjectid
FROM tableb)
SELECT s.studyid,
s.subjectid,
a.visitcode
FROM subjects s
INNER JOIN tablea a
ON s.studyid = a.studyid
LEFT JOIN tableb b
ON a.studyid = b.studyid
AND a.visitcode = b.visitcode
AND s.subjectid = b.subjectid
WHERE b.studyid IS NULL
ORDER BY s.studyid,
s.subjectid
Upvotes: 2
Reputation: 1491
You are trying to find out the missing visits for subjects, I believe there would be another table for Subject e.g. TableC and would have the relation to study, so joining the TableA and TableC will give the super set of subjects x visits, minus the TableB with this and you get the missing visits for the subjects.
SELECT StudyID
, SubjectID
, VisitCode
FROM (SELECT a.StudyID
, c.SubjectID
, a.VisitCode
FROM TableA a
LEFT JOIN TableC c ON a.StudyID = c.StudyID
EXCEPT
SELECT *
FROM TableB)
WHERE StudyID = 'C1234'
Upvotes: 1