lambda8
lambda8

Reputation: 317

SQL: select with disctint left

table

            id name   data
            ---------------
            1  name1  data1
            2  name2  data2
            3  name3  data3
            4  name4  data1abc
            5  name5  data2abc
            6  name6  data1abcd
            7  name7  data2abcde

The output that I need is rows with ids 3, 6 and 7.

I need to search for distinct data terms. The terms data1, data1abc, data1abcd should all be counted as one term and the rows with unique most characters should be returned i.e., data1abcd, data2abcde, data3

Can you help please?

This is what I have written so far, it doesnt work:

            SELECT *
            FROM table
            WHERE LEFT(data, 5) = (
                SELECT distinct LEFT(data, 5)
                FROM table
                )

Upvotes: 2

Views: 81

Answers (3)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125284

This will work for any string length:

select s
from (
    select
        s,
        case
            when s = left(lag(s) over (order by s desc), length(s))
            then false else true
        end as u
    from t
) t
where u
order by s
;
     s      
------------
 data1abcd
 data2abcde
 data3

Sample data:

create table t (id int, name text, s text);

insert into t (id, name, s) values
(1, 'name1', 'data1'),
(2, 'name2', 'data2'),
(3, 'name3', 'data3'),
(4, 'name4', 'data1abc'),
(5, 'name5', 'data2abc'),
(6, 'name6', 'data1abcd'),
(7, 'name7', 'data2abcde');

Upvotes: 2

user5480949
user5480949

Reputation: 1678

It seems that you want distinct non-numeric values. You may change data_len as len(data_r) if you want the longest non-numeric records

ALTER TABLE <yourtable>
ADD 
 data_r as REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE (data, '0', ''),'1', ''),'2', ''),'3', ''),'4', ''),'5', ''),'6', ''),'7', ''),'8', ''),'9', '')
,data_len as len(data)

SELECT * FROM <yourtable> t
WHERE EXISTS ( SELECT MAX(data_len), data_r 
               FROM <yourtable> t1
               WHERE t.data_r = t1.data_r
               AND t.data_len = t1.data_len
               GROUP BY data_r
             )

Upvotes: -1

Mureinik
Mureinik

Reputation: 311478

You could partition the rows by the first five characters and take the longest:

SELECT name, data
FROM   (SELECT name, 
               data,
               RANK() OVER (PARTITION BY LEFT(data, 5) 
                            ORDER BY LENGTH(data) DESC) AS rk
        FROM   mytable) t
WHERE  rk = 1

Note:
The rank function allows for ties (e.g., data1a and data1b have the same length, so they could both have the rank of 1 if there a no longer strings). If you do not want to allow for ties, you should use row_number instead.

Upvotes: 0

Related Questions