volume one
volume one

Reputation: 7563

Removing duplicate rows by selecting only those with minimum length value

I have a table with two string columns: Name and Code. Code is unique, but Name is not. Sample data:

Name      Code
--------  ----
Jacket    15
Jeans     003
Jeans     26

I want to select unique rows with the smallest Code value, but not in terms of numeric value; rather, the length of the string. Of course this does not work:

SELECT Name, Min(Code) as Code
FROM Clothes
GROUP BY Name, Code

The above code will return one row for Jeans like such:

Jeans | 003

That is correct, because as a number, 003 is less than 26. But not in my application, which cares about the length of the value, not the actual value. A value with a length of three characters is greater than a value with two characters. I actually need it to return this:

Jeans | 26

Because the length of 26 is shorter than the length of 003.

So how do I write SQL code that will select row that has the code with the minimum length, not the actual minimum value? I tried doing this:

SELECT Name, Min(Len(Code)) as Code
FROM Clothes
GROUP BY Name, Code

The above returns me only a single character so I end up with this:

Jeans | 2

Upvotes: 1

Views: 1723

Answers (3)

Aaron Bertrand
Aaron Bertrand

Reputation: 280429

;WITH cte AS
(
  SELECT Name, Code, rn = ROW_NUMBER()
    OVER (PARTITION BY Name ORDER BY LEN(Code))
    FROM dbo.Clothes
)
SELECT Name, Code
FROM cte
WHERE rn = 1;

SQLfiddle demo

If you have multiple values of code that share the same length, the choice will be arbitrary, so you can break the tie by adding an additional order by clause, e.g.

OVER (PARTITION BY Name ORDER BY LEN(Code), CONVERT(INT, Code) DESC)

SQLfiddle demo

Upvotes: 4

podiluska
podiluska

Reputation: 51504

Try this

select clothes.name, MIN(code)
from clothes
    inner join
    (
        SELECT 
            Name, Min(Len(Code)) as CodeLen
        FROM
            clothes 
        GROUP BY
        Name
    ) results
on clothes.name = results.name
and LEN(clothes.code) = results.CodeLen
group by clothes.name

Upvotes: 1

JeffSahol
JeffSahol

Reputation: 971

It sounds like you are trying to sort on the numeric value of the Code field. If so, the correct approach would be to cast it to INT first, and use that for sorting/min functions (in a subquery), then select the original code in your main query clause.

Upvotes: 0

Related Questions