winery
winery

Reputation: 23

Eliminating duplicate rows except one column with condition

I am having trouble trying to find an appropriate query(SQL-SERVER) for selecting records with condition however, the table I will be using has more than 100,000 rows and more than 20 columns.

So I need a code that satisfies the following condition:

1.)If [policy] and [plan] column is unique between rows then I will select that record

2.)If [policy] and [plan] return 2 or more rows then I will select the record which 'code' column isn't 999

3.)In some cases the unwanted rows may not have '999' in [code] column but may be other specifics

In other words, I would like to get row number 1,2,4,5,7.

Here is an example of what the table looks like

row #|policy|plan|code
-----------------------
  1  |  a   | aa |111
-----------------------
  2  |  b   | bb |112
-----------------------
  3  |  b   | bb |999
-----------------------
  4  |  c   | cc |111
-----------------------
  5  |  c   | cc |112
-----------------------
  6  |  c   | cc |999
-----------------------
  7  |  d   | dd |999
-----------------------

I'm expecting to see something like

row #|policy|plan|code
-----------------------
  1  |  a   | aa |111
-----------------------
  2  |  b   | bb |112
-----------------------
  4  |  c   | cc |111
-----------------------
  5  |  c   | cc |112
-----------------------
  7  |  d   | dd |999
-----------------------

Thank you in advance

Upvotes: 1

Views: 4917

Answers (3)

etsa
etsa

Reputation: 5060

May be you want this (eliminate the last row if more than one)?

select t.*
from (select t.*
            , row_number() over (partition by policy, plan
                          order by code desc
                         ) AS RN
            , COUNT(*) over (partition by policy, plan) AS RC
      from t
     ) t
where RN > 1 OR RN=RC;

Output:

    row policy  plan    code    RN  RC
1   1   a   aa  111 1   1
2   2   b   bb  112 2   2
3   5   c   cc  112 2   3
4   4   c   cc  111 3   3
5   7   d   dd  999 1   1

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269693

This sounds like a prioritization query. You an use row_number():

select t.*
from (select t.*,
             row_number() over (partition by policy, plan
                                order by code
                               ) as seqnum
      from t
     ) t
where seqnum = 1;

The expected output makes this a bit clearer:

select t.*
from (select t.*,
             rank() over (partition by policy, plan
                          order by (case when code = 999 then 1 else 2 end) desc
                         ) as seqnum
      from t
     ) t
where seqnum = 1;

The OP wants all codes that are not 999 unless the only codes are 999. So, another approach is:

select t.*
from t
where t.code <> 999
union all
select t.*
from t
where t.code = 999 and
      not exists (select 1
                  from t t2
                  where t2.policy = t.policy and t2.plan = t.plan and
                        t2.code <> 999
                 );

Upvotes: 6

Chanukya
Chanukya

Reputation: 5893

CREATE TABLE #Table2
    ([row] int, [policy] varchar(1), [plan] varchar(2), [code] int)
;

INSERT INTO #Table2
    ([row], [policy], [plan], [code])
VALUES
    (1, 'a', 'aa', 111),
    (2, 'b', 'bb', 112),
    (3, 'b', 'bb', 999),
    (4, 'c', 'cc', 111),
    (5, 'c', 'cc', 112),
    (6, 'c', 'cc', 999),
    (7, 'd', 'dd', 999)
;
with cte 
as
(
select *,
 row_number() over (partition by policy, [plan]
                                order by code
                               ) as seqnum
      from #Table2
  )

  select [row], [policy], [plan], [code] from cte where seqnum=1

Upvotes: 0

Related Questions