Reputation: 1313
I have a readonly table with a list of products, and I need to avoid selecting duplicates based on the serial number ('serial'). When I have a duplicate I want to select the duplicate which 'label' first letter is between A and J.
Here are my data and my try to get a selection without duplicates:
CREATE TABLE products(id INT, serial VARCHAR(25), label VARCHAR(50) , type VARCHAR(25));
INSERT INTO products(id, serial, label, type)
VALUES
( 1, '111', 'A1', 'computer'),
( 2, '222', 'B2', 'computer'),
( 3, '333', 'Z3', 'computer'),
( 4, '333', 'D4', 'computer'),
( 5, '555', 'E5', 'computer'),
( 6, '666', 'X6', 'computer'),
( 7, '777', 'G7', 'computer'),
( 8, '777', 'Y7', 'computer'),
( 9, '888', 'I8', 'computer'),
(10, '999', 'J9', 'screen'),
(11, '777', 'G7bis', 'computer'),
(12, '666', 'X6bis', 'computer');
SELECT COUNT(serial) OVER(PARTITION BY serial) as nbserial, *
FROM products
where type='computer' and nbserial=1 or
(nbserial>1 and LEFT(label, 1) between 'A' and 'J')
;
I have several problems: here I cannot define a condition about nbserial in the where clause. And if there are 3 duplicates, I need to select a line which verifies the condition: label first letter is between A and J. And if there are several duplicates, but none verifies the condition (firstletter between A and J), then select any line.
example of expected result: (no serial duplicate, and if possile label starts with letter between A and J)
( 1, '111', 'A1', 'computer'),
( 2, '222', 'B2', 'computer'),
( 4, '333', 'D4', 'computer'),
( 5, '555', 'E5', 'computer'),
( 6, '666', 'X6', 'computer'),
( 7, '777', 'G7', 'computer'),
( 9, '888', 'I8', 'computer'),
(10, '999', 'J9', 'screen'),
How can I do that with a SELECT, and I cannot change the table content ?
Thanks
Upvotes: 0
Views: 69
Reputation: 222432
You can use row_number()
and a conditional sort:
select *
from (
select p.*,
row_number() over(
partition by serial
order by case when left(label, 1) between 'A' and 'J' then 0 else 1 end, id
) rn
from products p
) p
where rn = 1
Or better yet, use distinct on
in Postgres:
select distinct on (serial) p.*
from products p
order by serial, (left(label, 1) between 'A' and 'J') desc, id
This gives one row per serial
, and prioritizes labels whose first letter is between "A" and "J". When there are ties, the row with the least id
is retained.
id | serial | label | type -: | :----- | :---- | :------- 1 | 111 | A1 | computer 2 | 222 | B2 | computer 4 | 333 | D4 | computer 5 | 555 | E5 | computer 6 | 666 | X6 | computer 7 | 777 | G7 | computer 9 | 888 | I8 | computer 10 | 999 | J9 | screen
Upvotes: 1