Reputation: 317
table
id name data
---------------
1 name1 data1
2 name2 data2
3 name3 data3
4 name4 data1abc
5 name5 data2abc
6 name6 data1abcd
7 name7 data2abcde
The output that I need is rows with ids 3, 6 and 7.
I need to search for distinct data terms. The terms data1, data1abc, data1abcd should all be counted as one term and the rows with unique most characters should be returned i.e., data1abcd, data2abcde, data3
Can you help please?
This is what I have written so far, it doesnt work:
SELECT *
FROM table
WHERE LEFT(data, 5) = (
SELECT distinct LEFT(data, 5)
FROM table
)
Upvotes: 2
Views: 81
Reputation: 125284
This will work for any string length:
select s
from (
select
s,
case
when s = left(lag(s) over (order by s desc), length(s))
then false else true
end as u
from t
) t
where u
order by s
;
s
------------
data1abcd
data2abcde
data3
Sample data:
create table t (id int, name text, s text);
insert into t (id, name, s) values
(1, 'name1', 'data1'),
(2, 'name2', 'data2'),
(3, 'name3', 'data3'),
(4, 'name4', 'data1abc'),
(5, 'name5', 'data2abc'),
(6, 'name6', 'data1abcd'),
(7, 'name7', 'data2abcde');
Upvotes: 2
Reputation: 1678
It seems that you want distinct non-numeric values. You may change data_len as len(data_r) if you want the longest non-numeric records
ALTER TABLE <yourtable>
ADD
data_r as REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE (data, '0', ''),'1', ''),'2', ''),'3', ''),'4', ''),'5', ''),'6', ''),'7', ''),'8', ''),'9', '')
,data_len as len(data)
SELECT * FROM <yourtable> t
WHERE EXISTS ( SELECT MAX(data_len), data_r
FROM <yourtable> t1
WHERE t.data_r = t1.data_r
AND t.data_len = t1.data_len
GROUP BY data_r
)
Upvotes: -1
Reputation: 311478
You could partition the rows by the first five characters and take the longest:
SELECT name, data
FROM (SELECT name,
data,
RANK() OVER (PARTITION BY LEFT(data, 5)
ORDER BY LENGTH(data) DESC) AS rk
FROM mytable) t
WHERE rk = 1
Note:
The rank
function allows for ties (e.g., data1a
and data1b
have the same length, so they could both have the rank of 1 if there a no longer strings). If you do not want to allow for ties, you should use row_number
instead.
Upvotes: 0