JeffK627
JeffK627

Reputation: 305

SQL Server get non-matching values from two tables

Here's the scenario: In a SQL Server 2008 R2 database, Table A has StudyID, VisitCode and VisitSequenceNumber. Table B has StudyID, SubjectID and VisitCode.

Table A has all possible VisitCodes for a given StudyID (There are over 200 StudyIDs in this table, each with its own set of VisitCodes).

Table B has all SubjectIDs for a given StudyID, as well as all VisitIDs for each subject (There are over 200 StudyIDs in this table, each with its own set of SubjectIDs).

I need to create a list of all SubjectIDs for a StudyID, aling with all the VisitIDs that the SubjectID does NOT have. I can't just make a list of SubjectIDs with missing visits, I need to identify which visits are missing for each subject.

So if for StudyID 'C1234' Table A has this:

StudyID | VisitCode | VisitSequenceNumber
C1234   | V100A     | 100
C1234   | V110A     | 110
C1234   | V120A     | 120
C1234   | UNS       | 999

And Table B has:

StudyID |SubjectID | VisitCode
C1234   | 01-001   | V100A
C1234   | 01-001   | V120A
C1234   | 01-001   | UNS
C1234   | 01-002   | V110A
C1234   | 01-002   | UNS

I need to return rows that would contain the following:

StudyID | SubjectID | VisitCode
C1234   | 01-001    | V110A
C1234   | 01-002    | V100A
C1234   | 01-002    | V120A

For some reason I just can't seem to work this one out - how do I match up the Subjects with the missing Visits when they are by definition not in Table B for those Subjects? Any help will be much appreciated!

Upvotes: 1

Views: 86

Answers (4)

Carsten Massmann
Carsten Massmann

Reputation: 28196

You have to introduce some kind of master list for the required subjectIDs and visitCodes.

Edit

I get all possible values for studyID from tableB (here: limited to 'C1234'), all values for subjectID from tableB again and all values for visitCodes from tableA. After that I run a join between all these possible values and check whether the combination already exists in tableB

SELECT sti studyId, sid studyId, vc visitCode
FROM 
 ( SELECT DISTINCT studyID sti FROM tableB ) stis
 INNER JOIN 
 ( SELECT DISTINCT StudyID ssi,subjectID sid FROM tableB) s ON ssi=sti
 INNER JOIN 
 ( SELECT DISTINCT StudyID vsi,visitCode vc FROM tableA ) v ON vsi=sti
WHERE NOT EXISTS (SELECT 1 FROM tableB  
      WHERE StudyID=sti AND subjectID=sid AND visitCode=vc)
AND sti = 'C1234' -- to limit the example to the current study

See here for a working demo (MySQL): http://sqlfiddle.com/#!9/cc91b/11
or here (T-SQL 2014): demo on data.stackexchange

Upvotes: 1

jpw
jpw

Reputation: 44881

One way is to use cross join in a derived table to generate all possible combinations of StudyID, VisitCode and SubjectId and then left join with that set and filter for null to find the missing rows:

select all_combo.studyid, all_combo.subjectid, all_combo.visitcode
from (
    select a.studyid, a.visitcode, b.subjectid 
    from tableb b 
    cross join tablea a 
    group by a.studyid, a.visitcode, b.subjectid 
) all_combo
left join tableb b 
 on all_combo.VisitCode = b.VisitCode 
and all_combo.StudyID = b.StudyID 
and all_combo.SubjectID = b.SubjectID 
where b.StudyID is null
  and all_combo.StudyID = 'C1234' -- you might have to limit to the specific StudyID
order by all_combo.SubjectID;

Sample SQL Fiddle

Result with your sample data:

| studyid | subjectid | visitcode |
|---------|-----------|-----------|
|   C1234 |    01-001 |    V110A  |
|   C1234 |    01-002 |    V100A  |
|   C1234 |    01-002 |    V120A  |

Upvotes: 4

Conrad Frix
Conrad Frix

Reputation: 52645

One of the things that makes this difficult is you don't have a table that defindes which subjects are associated with a studyid so we have to derive it from b. Using a CTE will do that. Also the JOIN to tableB is wierd because it uses the distinct set of {StudyId, subjectID} as well as the possible visits

WITH subjects 
     AS (SELECT DISTINCT studyid, 
                         subjectid 
         FROM   tableb) 
SELECT s.studyid, 
       s.subjectid, 
       a.visitcode 
FROM   subjects s 
       INNER JOIN tablea a 
               ON s.studyid = a.studyid 
       LEFT JOIN tableb b 
              ON a.studyid = b.studyid 
                 AND a.visitcode = b.visitcode 
                 AND s.subjectid = b.subjectid 
WHERE  b.studyid IS NULL 
ORDER  BY s.studyid, 
          s.subjectid 

DEMO

Upvotes: 2

Techie
Techie

Reputation: 1491

You are trying to find out the missing visits for subjects, I believe there would be another table for Subject e.g. TableC and would have the relation to study, so joining the TableA and TableC will give the super set of subjects x visits, minus the TableB with this and you get the missing visits for the subjects.

SELECT StudyID
     , SubjectID
     , VisitCode
  FROM (SELECT a.StudyID
             , c.SubjectID
             , a.VisitCode
          FROM TableA a 
          LEFT JOIN TableC c ON a.StudyID = c.StudyID
        EXCEPT
        SELECT * 
          FROM TableB)
 WHERE StudyID = 'C1234'

Upvotes: 1

Related Questions