teelove
teelove

Reputation: 71

Compare columns and dedupe

Please see below a table where I would like to create a distinct list of telephone numbers, the ranking GROUP numbers need to take priority descending 1-4. The example should result in ID 4, 9,10 (highlighted as red x) being removed as they share duplicate telephone numbers but in a lower priority GROUP.

I have managed to do this a long way round in many steps and creating a surrogate key, but I would like it if someone could show me how this could be done in a single step, for efficiency (and to sharpen up my SQL).

In SAS9 I had a coalescec function where I was able to create a new variable and instruct the priority - but I know of no such function in SQL.

https://documentation.sas.com/doc/en/vdmmlcdc/8.1/ds2ref/n0crpo0xd76wb3n1poba9wmu1a6q.htm

Markers showing the duplicates - green stay - red remove

please see mock data in SQL below

    CREATE TABLE MT_TEMP_TEL2 (
  ID INT NOT NULL,
  GROUP INT NOT NULL,
  TEL1 VARCHAR(11),
  TEL2 VARCHAR(11),
  TEL3 VARCHAR(11),
  TEL4 VARCHAR(11)
);

INSERT INTO MT_TEMP_TEL2
VALUES 
  (1,4,"79000000000","","",""),
  (2,1,"12111111111","79999999999","",""),
  (3,1,"","13842222222","",""),
  (4,2,"","","78888888888","12111111111"),
  (5,3,"","73333333333","",""),
  (6,1,"15278888888","","",""),
  (7,2,"","","71111111111",""),
  (8,4,"","13843333333","","72222222222"),
  (9,4,"","73333333333","",""),
  (10,3,"75555555555","","12155555555","13842222222");

Here is the desired results as stated above, we have now removed rows 4, 9 and 10

ID GROUP TEL1 TEL2 TEL3 TEL4
1 4 79000000000
2 1 12111111111 79999999999
3 1 13842222222
5 3 73333333333
6 1 15278888888
7 2 71111111111
8 4 13843333333 72222222222

Upvotes: 0

Views: 73

Answers (1)

Stu
Stu

Reputation: 32619

I had assumed with the mention of coalesce that the goal was to have the 4 columns condensed into a single list of values, the expected results example shows now that's not the case.

There are a few ways to tackle this, one is to use window functions to assign a sequential number to each reccuring value; to do that, the values must be combined into a single list for which you can use union, the results of which are then partitioned by the combined values and each duplicate gets a sequential row number. It's then a matter of keeping just those with a value of 1.

with Alltel as (
    select id, [group], tel1 from MT_TEMP_TEL2 union all
    select id, [group], tel2 from MT_TEMP_TEL2 union all
    select id, [group], tel3 from MT_TEMP_TEL2 union all
    select id, [group], tel4 from MT_TEMP_TEL2 
), rn as (
    select id, tel1, Row_Number() over (partition by tel1 order by [group]) n
    from alltel
    where tel1 !=''
)
select * 
from MT_TEMP_TEL2 t
where not exists (select * from rn where rn.id=t.id and rn.n>1)

Upvotes: 1

Related Questions