Reputation: 472
I would like to perform a "special group by" on strings with SQL language, some ending with "*". I use postgresql. I can not clearly formulate this problem, even if I have partially solved it, with select, union and nested queries which are not elegant.
For exemple :
1) INPUT : I have a list of strings :
thestrings
varchar(9)
--------------
1000
1000-0001
1000-0002
2000*
2000-0001
2000-0002
3000*
3000-00*
3000-0001
3000-0002
2) OUTPUT : That I would like my "special group by" return :
1000
1000-0001
1000-0002
2000*
3000*
Because 2000-0001 and 2000-0002 are include in 2000*, and because 3000-00*, 3000-0001 and 3000-0002 are includes in 3000*
3) SQL query I do :
SELECT every strings ending with *
UNION
SELECT every string where the begining NOT IN (SELECT every string ending with *) <-- with multiple inelegant left functions and NOT IN subqueries
4) That what I'm doing return :
1000
1000-0001
1000-0002
2000*
3000*
3000-00* <-- the problem
The problem is : 3000-00* staying in my result.
So my question is : How can I generalize my problem? to remove all string who have a same begining string in the list (ending with *) ? I think of regular expressions, but how to pass a list from a select in a regex ?
Thanks for help.
Upvotes: 1
Views: 72
Reputation: 95101
Select only strings for which no master string exists in the table:
select str
from mytable
where not exists
(
select *
from mytable master
where master.str like '%*'
and master.str <> mytable.str
and rtrim(mytable.str, '*') like rtrim(master.str, '*') || '%'
);
Upvotes: 2
Reputation: 1271151
Assuming that only one general pattern can match any given string, the following should do what you want:
select coalesce(tpat.thestring, t.thestring) as thestring
from t left join
t tpat
on t.thestring like replace(tpat.thestring, '*', '%') and
t.thestring <> tpat.thestring
group by coalesce(tpat.thestring, t.thestring);
However, that is not your case. However, you can adjust this with distinct on
:
select distinct on (t.thestring) coalesce(tpat.thestring, t.thestring)
from t left join
t tpat
on t.thestring like replace(tpat.thestring, '*', '%') and
t.thestring <> tpat.thestring
order by t.thestring, length(tpat.thestring)
Upvotes: 1