SQL Select without duplicates, and conditional choice of the proper duplicate to keep

Question

I have a readonly table with a list of products, and I need to avoid selecting duplicates based on the serial number ('serial'). When I have a duplicate I want to select the duplicate which 'label' first letter is between A and J.

Here are my data and my try to get a selection without duplicates:

CREATE TABLE products(id INT, serial VARCHAR(25), label VARCHAR(50) , type VARCHAR(25));

INSERT INTO products(id, serial, label, type)
    VALUES
        ( 1, '111', 'A1', 'computer'),
        ( 2, '222', 'B2', 'computer'),
        ( 3, '333', 'Z3', 'computer'),
        ( 4, '333', 'D4', 'computer'),
        ( 5, '555', 'E5', 'computer'),
        ( 6, '666', 'X6', 'computer'),
        ( 7, '777', 'G7', 'computer'),
        ( 8, '777', 'Y7', 'computer'),
        ( 9, '888', 'I8', 'computer'),
        (10, '999', 'J9', 'screen'),
        (11, '777', 'G7bis', 'computer'),
        (12, '666', 'X6bis', 'computer');


SELECT COUNT(serial) OVER(PARTITION BY serial) as nbserial, *
FROM products
where type='computer' and nbserial=1 or
(nbserial>1 and LEFT(label, 1) between 'A' and 'J')
;

I have several problems: here I cannot define a condition about nbserial in the where clause. And if there are 3 duplicates, I need to select a line which verifies the condition: label first letter is between A and J. And if there are several duplicates, but none verifies the condition (firstletter between A and J), then select any line.

example of expected result: (no serial duplicate, and if possile label starts with letter between A and J)

    ( 1, '111', 'A1', 'computer'),
    ( 2, '222', 'B2', 'computer'),
    ( 4, '333', 'D4', 'computer'),
    ( 5, '555', 'E5', 'computer'),
    ( 6, '666', 'X6', 'computer'),
    ( 7, '777', 'G7', 'computer'),
    ( 9, '888', 'I8', 'computer'),
    (10, '999', 'J9', 'screen'),

How can I do that with a SELECT, and I cannot change the table content ?

Thanks

GMB · Accepted Answer

You can use row_number() and a conditional sort:

select *
from (
    select p.*,
        row_number() over(
            partition by serial
            order by case when left(label, 1) between 'A' and 'J' then 0 else 1 end, id
        ) rn
    from products p
) p
where rn = 1

Or better yet, use distinct on in Postgres:

select distinct on (serial) p.*
from products p
order by serial, (left(label, 1) between 'A' and 'J') desc, id

This gives one row per serial, and prioritizes labels whose first letter is between "A" and "J". When there are ties, the row with the least id is retained.

Demo on DB Fiddle:

id | serial | label | type    
-: | :----- | :---- | :-------
 1 | 111    | A1    | computer
 2 | 222    | B2    | computer
 4 | 333    | D4    | computer
 5 | 555    | E5    | computer
 6 | 666    | X6    | computer
 7 | 777    | G7    | computer
 9 | 888    | I8    | computer
10 | 999    | J9    | screen

SQL Select without duplicates, and conditional choice of the proper duplicate to keep

Answers (1)

Related Questions