Reputation: 783
I have a table with 3 columns. I need to generate combinations (always in 3's) of the values (in "tests"column) of the table that were performed based on a particular "diagnosis". However, it is possible that a particular diagnosis has 2 or lesser tests, in which case the logic would still output that combination, albeit with 2 values. With reference to the table below, for each cust_id there is a "diagnosis" column based on which the "tests" were performed. Now for every diagnosis value group, I need to generate unique combinations of the corresponding values in the "tests" column. Note, the combinations should always be with 3 values ( where values >= 3), but for diagnosis which have lesser than 3 values (1 or 2), the corresponding combination should still be output (with the available 1 or 2 values and substitute Null in place of the unavailable value).
PATIENT:
pat_id | diagnosis | tests
1001 | Thyroid | CAT
1001 | Thyroid | MRI
1001 | Thyroid | Blood
1001 | Tonsil | CAT
1001 | Tonsil | MRI
1001 | Tonsil | Blood
1001 | Tonsil | RAPID
1002 | Pneumonia | MRI
1002 | Pneumonia | Eliza
1003 | Bronchitis | X-Ray
So, for pat_id = '1001'
and diagnosis = 'Thyroid'
we see that "tests" have 3 distinct values. Hence, only 1 unique combination is possible i.e. {CAT, MRI, Blood}
. Similarly, for pat_id = '1001' and diagnosis = 'Tonsil'
, we see there are 4 distinct values in "test" column. Hence, there will be 4 combinations i.e. {CAT, MRI, Blood}, {CAT, MRI, RAPID}, {MRI, Blood, RAPID} & {CAT, blood, RAPID}
. For pat_id = '1002'
only two unique values exists. Hence the combination will be only 1 i.e. {MRI, Eliza}
. Similarly for, pat_id = '1003'
has only 1 value i.e. X-Ray, hence the output should be {X-Ray}
for '1003'
.
Like this, I need to generate similar combinations for all diagnosis values in a group and finally, identify the unique combination that occurs the max number of times in that table. The output should be the combination that occurs the most in the table.
So far, the sql below returns all the combinations that have 3 or more values. But it is failing to output the ones which have lesser than 3 values. Meaning, 1002 & 1003
are not being output since they have lesser than 3 values but needs to be output. The solution needs to handle such cases as well.
select p1.pat_id, p1.diagnosis, p1.tests, p2.tests, p3.tests
from patient p1 join
patient p2
on p1.pat_id = p2.pat_id and p1.diagnosis = p2.diagnosis and
p1.tests < p2.tests join
patient p3
on p2.pat_id = p3.pat_id and p2.diagnosis = p3.diagnosis and
p2.tests < p3.tests ;
Also please indicate how I can identify which combination occurs the most. Thanks.
Upvotes: 1
Views: 39
Reputation: 31772
You can use your query with left joins to allow the second and the third tests to be NULL. But then you will need to remove rows with NULLs for groups with two or more tests. You can use correlated (dependent) COUNT(*)
subquery to achieve that:
select
p1.pat_id,
p1.diagnosis,
p1.tests as test1,
p2.tests as test2,
p3.tests as test3
from patient p1
left join patient p2
on p2.diagnosis = p1.diagnosis
and p2.pat_id = p1.pat_id
and p2.tests > p1.tests
left join patient p3
on p3.diagnosis = p1.diagnosis
and p3.pat_id = p1.pat_id
and p3.tests > p2.tests
where
case (
select count(*)
from patient p
where p.diagnosis = p1.diagnosis
and p.pat_id = p1.pat_id
)
when 1 then true
when 2 then p2.tests is not null
else p3.tests is not null
end
order by p1.pat_id, p1.diagnosis
Result:
| pat_id | diagnosis | test1 | test2 | test3 |
| ------ | ---------- | ----- | ----- | ----- |
| 1001 | Thyroid | Blood | CAT | MRI |
| 1001 | Tonsil | CAT | MRI | RAPID |
| 1001 | Tonsil | Blood | MRI | RAPID |
| 1001 | Tonsil | Blood | CAT | MRI |
| 1001 | Tonsil | Blood | CAT | RAPID |
| 1002 | Pneumonia | Eliza | MRI | |
| 1003 | Bronchitis | X-Ray | | |
To order distinct combinations by the number of their occurences, you just need to modify it to a GROUP BY ... ORDER BY COUNT(*)
query:
select
p1.tests as test1,
p2.tests as test2,
p3.tests as test3,
count(*) as cnt
from patient p1
left join patient p2
on p2.diagnosis = p1.diagnosis
and p2.pat_id = p1.pat_id
and p2.tests > p1.tests
left join patient p3
on p3.diagnosis = p1.diagnosis
and p3.pat_id = p1.pat_id
and p3.tests > p2.tests
where
case (
select count(*)
from patient p
where p.diagnosis = p1.diagnosis
and p.pat_id = p1.pat_id
)
when 1 then true
when 2 then p2.tests is not null
else p3.tests is not null
end
group by p1.tests, p2.tests, p3.tests
order by cnt desc
Result:
| test1 | test2 | test3 | cnt |
| ----- | ----- | ----- | --- |
| Blood | CAT | MRI | 2 |
| CAT | MRI | RAPID | 1 |
| Blood | MRI | RAPID | 1 |
| Eliza | MRI | | 1 |
| X-Ray | | | 1 |
| Blood | CAT | RAPID | 1 |
Upvotes: 1
Reputation: 1269543
I think you want left join
and group by
:
select p1.tests, p2.tests, p3.tests, count(*)
from patient p1 left join
patient p2
on p1.pat_id = p2.pat_id and p1.diagnosis = p2.diagnosis and
p1.tests < p2.tests left join
patient p3
on p2.pat_id = p3.pat_id and p2.diagnosis = p3.diagnosis and
p2.tests < p3.tests
group by p1.tests, p2.tests, p3.tests
order by count(*) desc;
I am not sure if diagnosis
should also be part of the result set. As you describe the results, it does not seem to be, but to me it would make sense.
Upvotes: 0