Andrew
Andrew

Reputation: 1

SQL Self join with closest time stamp

I have a table on a SQL Server with millions of lab results from different people (AA,BB etc.). Some substances, for instance sodium and potassium, may be measured by two different methods (A and B).

I would now like to do some sort of "self join" based on the following criteria:

I would like to compare the results of the same substance obtained 1) from the same person, 2) with the two different methods and 3) with the closest date/time stamp

For instance, each SampleID of Sodium_A analysis should be joined with the Sodium_B analysis from the same person with the closest date/time stamp:

Source table:

select T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result
from [DataTable] T1 LEFT JOIN
     [AnalysisTable] A1
     ON A1.DW_SK_Analyse = T1.DW_SK_Analyse
DateTime PersonID SampleID Analysis Result
01-01-2021 10:30 AA 1 Sodium_A 10
01-01-2021 10:30 AA 1 Potassium_A 5
10-01-2021 11:30 AA 2 Sodium_A 15
10-01-2021 11:30 AA 2 Potassium_A 15
11-02-2021 12:30 AA 3 Sodium_A 20
16-03-2021 13:30 AA 4 Sodium_A 9
18-04-2021 14:30 AA 5 Sodium_A 1
02-01-2021 10:30 AA 6 Sodium_B 9
20-03-2021 13:30 AA 9 Sodium_B 11
20-04-2021 14:30 AA 10 Sodium_B 2
20-04-2021 14:30 AA 10 Potassium_B 6
23-05-2021 12:50 BB 13 Sodium_B 58
26-05-2021 11:20 BB 14 Potassium_A 11
29-05-2021 12:20 BB 15 Sodium_A 15
30-06-2021 11:20 BB 16 Sodium_B 24
30-06-2021 11:20 BB 16 Potassium_B 21

Desired result:

DateTime1 DateTime2 PersonID SampleID1 SampleID2 Analysis1 Analysis2 Result1 Result2
01-01-2021 10:30 02-01-2021 10:30 A 1 6 Sodium_A Sodium_B 10 9
10-01-2021 10:30 02-01-2021 10:30 A 2 6 Sodium_A Sodium_B 15 9
29-05-2021 12:20 23-05-2021 12:50 B 15 13 Sodium_A Sodium_B 15 58

I hope it makes sense... :-) Any idea on how to do this?

I've started with something like this, but it does not consider the time difference:

select 
T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result,
T2.DateTime, T2.PersonID, T2.SampleID, A2.Analysis, T2.Result 

from [DataTable] T1

LEFT JOIN  [AnalysisTable] A1  ON  A1.DW_SK_Analyse=T1.DW_SK_Analyse
INNER JOIN [DataTable] T2  ON  T1.PersonID=T2.PersonID
LEFT JOIN  [AnalysisTable] A2  ON  A2.DW_SK_Analyse=T2.DW_SK_Analyse

WHERE (select AnaType1  = 

CASE
WHEN   A1.Analysis='Sodium_A' THEN 'Sodium' 
WHEN   A1.Analysis='Potassium_A' THEN 'Potassium' 
else 'NN' end) = (select AnaType2  = 
CASE
WHEN   A2.Analysis='Sodium_B' THEN 'Sodium' 
WHEN   A2.Analysis='Potassium_B' THEN 'Potassium' 
else  'OO'
end )

Upvotes: 0

Views: 99

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269703

You can use apply:

with t as (
      select T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result,
             left(A1.Analysis, len(A1.Analysis) - 2) as substance,
             right(A1.Analysis, 1) as which
      from [DataTable] T1 left join
           [AnalysisTable] A1
           on A1.DW_SK_Analyse = T1.DW_SK_Analyse
     )
select t.*, t_other.*
from t outer apply 
     (select top (1) t_other.*
      from t t_other
      where t_other.personId = t.personId and
            t_other.substance = t.substance and
            t_other.which <> t.which
      order by abs(datediff(second, t.datetime, t_other.datetime))
     ) t_other;

Note that I split the "analysis" into two parts, one for the substance and one for the particular test.

Upvotes: 1

Related Questions