Flayckz
Flayckz

Reputation: 77

Improve SQL Query to find redundant data

the following shows my sample dataset

PatientID        PatientName
XXX-037070002    Riger, Jens^Wicki
XXX-037070002    Riger^Wicki
XXX-10052        Weier,Nicole^Peggy
XXX-10052        Weier,Nicole^Peppy
XXX-23310        Rodem^Sieglinde
XXX-23310        Sauberger, Birgit^Finja
XXX-23343        Je, Ronny^Wilma
XXX-23343        Jer, Ronny^Wilma
XXX-2349         Kel,Andy^Juka
XXX-2349         Kel^Juka
XXX-2998         Hel, Frank
XXX-2998         Hel,Frank^Fenris
XXX-3188         Mey, Marion
XXX-3188         Mey, Marion^Paula
XXX-3188         Schulz^Roma
XXX-3218         Böntgen-Simnet,Dr. Regine^Cara
XXX-3218         Simnet,Dr. Regine^Cara
XXX-3826         Mertes, Bernd Uwe^Ellie
XXX-3826         Mertes,Bernd^Ellie
XXX-3826         Mertes^Ellie

This is the query I got from my last request:

with d as
(   
select distinct
    patid,
    patname
from dicomstudys
)
select *
from d
where d.patid in
(   
select d.patid
from d
group by d.patid
having count(*) > 1
)

Now I want to adjust the query that only the following data get's an output:

PatientID        PatientName
XXX-23310        Rodem^Sieglinde
XXX-23310        Sauberger, Birgit^Finja
XXX-23343        Je, Ronny^Wilma
XXX-23343        Jer, Ronny^Wilma
XXX-3188         Mey, Marion
XXX-3188         Mey, Marion^Paula
XXX-3188         Schulz^Roma
XXX-3218         Böntgen-Simnet,Dr. Regine^Cara
XXX-3218         Simnet,Dr. Regine^Cara

Last names are either seperated with a ',' or '^' . If last names are the same for the same PatientID then I dont want them being displayed. I tried fiddling with a sub select statement featuring a combination of CHARINDEX commands and others but my SQL syntax knowledge is very limited with the complexity of the request.

Please also note that for the case for XXX-3188 has two datasets with the same last name but also another dataset with a complete new patientName and thus it needs to be in the output.

Upvotes: 1

Views: 106

Answers (2)

gotqn
gotqn

Reputation: 43626

Try this:

DECLARE @DataSource TABLE
(
    [ID] VARCHAR(32)
   ,[Name] VARCHAR(256)
);

INSERT INTO @DataSource ([ID], [Name])
VALUES ('XXX-037070002', 'Riger, Jens^Wicki')
      ,('XXX-037070002', 'Riger^Wicki')
      ,('XXX-10052', 'Weier,Nicole^Peggy')
      ,('XXX-10052', 'Weier,Nicole^Peppy')
      ,('XXX-23310', 'Rodem^Sieglinde')
      ,('XXX-23310', 'Sauberger, Birgit^Finja')
      ,('XXX-23343', 'Je, Ronny^Wilma')
      ,('XXX-23343', 'Jer, Ronny^Wilma')
      ,('XXX-2349', 'Kel,Andy^Juka')
      ,('XXX-2349', 'Kel^Juka')
      ,('XXX-2998', 'Hel, Frank')
      ,('XXX-2998', 'Hel,Frank^Fenris')
      ,('XXX-3188', 'Mey, Marion')
      ,('XXX-3188', 'Mey, Marion^Paula')
      ,('XXX-3188', 'Schulz^Roma')
      ,('XXX-3218', 'Böntgen-Simnet,Dr. Regine^Cara')
      ,('XXX-3218', 'Simnet,Dr. Regine^Cara')
      ,('XXX-3826', 'Mertes, Bernd Uwe^Ellie')
      ,('XXX-3826', 'Mertes,Bernd^Ellie')
      ,('XXX-3826', 'Mertes^Ellie');

WITH DataSource AS
(
    SELECT [ID]
          ,[Name]
          ,COUNT(*) OVER (PARTITION BY [ID], LTRIM(RTRIM(SUBSTRING([Name],  0, CHARINDEX(',', REPLACE([Name], '^', ',')))))) AS [ID_Name_Count]
          ,COUNT(*) OVER (PARTITION BY [ID]) AS [ID_Count]
          ,LTRIM(RTRIM(SUBSTRING([Name],  0, CHARINDEX(',', REPLACE([Name], '^', ','))))) AS [FamilyName]
    FROM @DataSource
)
SELECT [ID]
      ,[Name]
FROM DataSource
WHERE [ID_Name_Count] = 1
    AND [ID_Count] = 2
    OR [ID] IN
    (
        SELECT [ID]
        FROM DataSource
        GROUP BY [ID]
        HAVING COUNT(DISTINCT [FamilyName]) > 1
    );

enter image description here

Тhe solution is pretty easy. Here are the interesting parts:

  • replace the ^ with , in order to simplify the last name extraction
  • extract the last name and calculation count based on ID and last name
  • in the final select check for unique id-last name pairs with id count equal to 2 and add ids with more then one unique family names (your special case)

Upvotes: 1

Jacek Wróbel
Jacek Wróbel

Reputation: 1222

You can try something like that:

Test data

drop table if exists #Patient;

create table #Patient (
    PatientID varchar(20),
    PatientName varchar(50)
);

insert into #Patient(PatientID,PatientName) 
    values  ('XXX-037070002'   ,'Riger, Jens^Wicki'),
            ('XXX-037070002'   ,'Riger^Wicki'),
            ('XXX-10052'       ,'Weier,Nicole^Peggy'),
            ('XXX-10052'       ,'Weier,Nicole^Peppy'),
            ('XXX-23310'       ,'Rodem^Sieglinde'),
            ('XXX-23310'       ,'Sauberger, Birgit^Finja'),
            ('XXX-23343'       ,'Je, Ronny^Wilma'),
            ('XXX-23343'       ,'Jer, Ronny^Wilma'),
            ('XXX-2349'        ,'Kel,Andy^Juka'),
            ('XXX-2349'        ,'Kel^Juka'),
            ('XXX-2998'        ,'Hel, Frank'),
            ('XXX-2998'        ,'Hel,Frank^Fenris'),
            ('XXX-3188'        ,'Mey, Marion'),
            ('XXX-3188'        ,'Mey, Marion^Paula'),
            ('XXX-3188'        ,'Schulz^Roma'),
            ('XXX-3218'        ,'Böntgen-Simnet,Dr. Regine^Cara'),
            ('XXX-3218'        ,'Simnet,Dr. Regine^Cara'),
            ('XXX-3826'        ,'Mertes, Bernd Uwe^Ellie'),
            ('XXX-3826'        ,'Mertes,Bernd^Ellie'),
            ('XXX-3826'        ,'Mertes^Ellie');

My solution

with q1 as (
select 
    PatientID,
    PatientName,
    case when CHARINDEX(',',REPLACE( PatientName, '^',',')) > 0 
        then LEFT(PatientName,CHARINDEX(',',REPLACE( PatientName, '^',','))-1) 
        else PatientName end as FullName 
    from #Patient
) ,
q2 as (
    select PatientID 
    from q1 
    group by PatientID having COUNT(1) > 1 and COUNT(DISTINCT FullName) > 1 )
select t.PatientID,t.PatientName
from #Patient t join q2 on t.PatientID = q2.PatientID;

Upvotes: 0

Related Questions