Reputation: 1
I have a table on a SQL Server with millions of lab results from different people (AA,BB etc.). Some substances, for instance sodium and potassium, may be measured by two different methods (A and B).
I would now like to do some sort of "self join" based on the following criteria:
I would like to compare the results of the same substance obtained 1) from the same person, 2) with the two different methods and 3) with the closest date/time stamp
For instance, each SampleID of Sodium_A analysis should be joined with the Sodium_B analysis from the same person with the closest date/time stamp:
Source table:
select T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result
from [DataTable] T1 LEFT JOIN
[AnalysisTable] A1
ON A1.DW_SK_Analyse = T1.DW_SK_Analyse
DateTime | PersonID | SampleID | Analysis | Result |
---|---|---|---|---|
01-01-2021 10:30 | AA | 1 | Sodium_A | 10 |
01-01-2021 10:30 | AA | 1 | Potassium_A | 5 |
10-01-2021 11:30 | AA | 2 | Sodium_A | 15 |
10-01-2021 11:30 | AA | 2 | Potassium_A | 15 |
11-02-2021 12:30 | AA | 3 | Sodium_A | 20 |
16-03-2021 13:30 | AA | 4 | Sodium_A | 9 |
18-04-2021 14:30 | AA | 5 | Sodium_A | 1 |
02-01-2021 10:30 | AA | 6 | Sodium_B | 9 |
20-03-2021 13:30 | AA | 9 | Sodium_B | 11 |
20-04-2021 14:30 | AA | 10 | Sodium_B | 2 |
20-04-2021 14:30 | AA | 10 | Potassium_B | 6 |
23-05-2021 12:50 | BB | 13 | Sodium_B | 58 |
26-05-2021 11:20 | BB | 14 | Potassium_A | 11 |
29-05-2021 12:20 | BB | 15 | Sodium_A | 15 |
30-06-2021 11:20 | BB | 16 | Sodium_B | 24 |
30-06-2021 11:20 | BB | 16 | Potassium_B | 21 |
Desired result:
DateTime1 | DateTime2 | PersonID | SampleID1 | SampleID2 | Analysis1 | Analysis2 | Result1 | Result2 |
---|---|---|---|---|---|---|---|---|
01-01-2021 10:30 | 02-01-2021 10:30 | A | 1 | 6 | Sodium_A | Sodium_B | 10 | 9 |
10-01-2021 10:30 | 02-01-2021 10:30 | A | 2 | 6 | Sodium_A | Sodium_B | 15 | 9 |
29-05-2021 12:20 | 23-05-2021 12:50 | B | 15 | 13 | Sodium_A | Sodium_B | 15 | 58 |
I hope it makes sense... :-) Any idea on how to do this?
I've started with something like this, but it does not consider the time difference:
select
T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result,
T2.DateTime, T2.PersonID, T2.SampleID, A2.Analysis, T2.Result
from [DataTable] T1
LEFT JOIN [AnalysisTable] A1 ON A1.DW_SK_Analyse=T1.DW_SK_Analyse
INNER JOIN [DataTable] T2 ON T1.PersonID=T2.PersonID
LEFT JOIN [AnalysisTable] A2 ON A2.DW_SK_Analyse=T2.DW_SK_Analyse
WHERE (select AnaType1 =
CASE
WHEN A1.Analysis='Sodium_A' THEN 'Sodium'
WHEN A1.Analysis='Potassium_A' THEN 'Potassium'
else 'NN' end) = (select AnaType2 =
CASE
WHEN A2.Analysis='Sodium_B' THEN 'Sodium'
WHEN A2.Analysis='Potassium_B' THEN 'Potassium'
else 'OO'
end )
Upvotes: 0
Views: 99
Reputation: 1269703
You can use apply
:
with t as (
select T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result,
left(A1.Analysis, len(A1.Analysis) - 2) as substance,
right(A1.Analysis, 1) as which
from [DataTable] T1 left join
[AnalysisTable] A1
on A1.DW_SK_Analyse = T1.DW_SK_Analyse
)
select t.*, t_other.*
from t outer apply
(select top (1) t_other.*
from t t_other
where t_other.personId = t.personId and
t_other.substance = t.substance and
t_other.which <> t.which
order by abs(datediff(second, t.datetime, t_other.datetime))
) t_other;
Note that I split the "analysis" into two parts, one for the substance and one for the particular test.
Upvotes: 1