S..
S..

Reputation: 1272

Oracle Query - Use of Analytical functions

Assume we have loaded a flat file with patient diagnosis data into a table called “Data”. The table structure is:

Create table Data (
Firstname varchar(50),
Lastname varchar(50),
Date_of_birth datetime,
Medical_record_number varchar(20),
Diagnosis_date datetime,
Diagnosis_code varchar(20))

The data in the flat file looks like this:

'jane','jones','2/2/2001','MRN-11111','3/3/2009','diabetes'
'jane','jones','2/2/2001','MRN-11111','1/3/2009','asthma'
'jane','jones','5/5/1975','MRN-88888','2/17/2009','flu'
'tom','smith','4/12/2002','MRN-22222','3/3/2009','diabetes'
'tom','smith','4/12/2002','MRN-33333','1/3/2009','asthma'
'tom','smith','4/12/2002','MRN-33333','2/7/2009','asthma'
'jack','thomas','8/10/1991','MRN-44444','3/7/2009','asthma'

You can assume that no two patients have the same firstname, lastname, and date of birth combination. However one patient might have several visits on different days. These should all have the same medical record number. The problem is this: Tom Smith has 2 different medical record numbers. Write a query that would always show all the patients who are like Tom Smith – patients with more than one medical record number.

I came up with below query. It works perfectly fine, but wanted to know if there is a better way to write this query using Oracle Analytical function's. Thank you in advance

SELECT   a.firstname,
         a.lastname,
         a.date_of_birth,
         a.medical_record_number
FROM     data a, data b
WHERE        a.firstname = b.firstname
         AND a.lastname = b.lastname
         AND a.date_of_birth = b.date_of_birth
         AND a.medical_record_number <> .medical_record_number
GROUP BY a.firstname,
         a.lastname,
         a.date_of_birth,
         a.medical_record_number

Upvotes: 1

Views: 189

Answers (2)

Boneist
Boneist

Reputation: 23578

It is possible to do via analytic functions, but whether it's faster than doing the join in your query* or not depends on what data you have. You'd need to test.

with data (firstname, lastname, date_of_birth, medical_record_number, diagnosis_date, diagnosis_code)
          as (select 'jane','jones','2/2/2001','MRN-11111',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
              select 'jane','jones','2/2/2001','MRN-11111',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
              select 'jane','jones','5/5/1975','MRN-88888',to_date('2/17/2009', 'mm/dd/yyyy'),'flu' from dual union all
              select 'tom','smith','4/12/2002','MRN-22222',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
              select 'tom','smith','4/12/2002','MRN-33333',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
              select 'tom','smith','4/12/2002','MRN-33333',to_date('2/7/2009', 'mm/dd/yyyy'),'asthma' from dual union all
              select 'jack','thomas','8/10/1991','MRN-44444',to_date('3/7/2009', 'mm/dd/yyyy'),'asthma' from dual),
-- end of mimicking your table and its data
      res as (select firstname,
                     lastname,
                     date_of_birth,
                     medical_record_number,
                     count(distinct medical_record_number) over (partition by firstname, lastname, date_of_birth) cnt_med_rec_nums
              from   data)
select distinct firstname,
                lastname,
                date_of_birth,
                medical_record_number
from   res
where  cnt_med_rec_nums > 1;

*btw, the group by in your example query is not necessary; it would make much more sense to switch it out for a distinct - it makes your intent much clearer, since you're wanting to get a distinct set of records.

Upvotes: 1

Justin Cave
Justin Cave

Reputation: 231661

You can probably simplify the query a bit using a HAVING clause rather than doing a self-join

SELECT   a.firstname,
         a.lastname,
         a.date_of_birth,
         MIN(a.medical_record_number) lowest_medical_record_number,
         MAX(a.medical_record_number) highest_medical_record_number
FROM     data a
GROUP BY a.firstname,
         a.lastname,
         a.date_of_birth
HAVING   COUNT( DISTINCT a.medical_record_number ) > 1

I'm returning the smallest and largest medical record number for each patient here (that's what I'd do if most of the patients with this problem have just two numbers rather than having dozens). You could return just one or you could return a comma-separated list of all the medical record numbers if you'd rather (which would probably make more sense if most of the bad folks have dozens of numbers).

Upvotes: 0

Related Questions