NIF
NIF

Reputation: 472

SQL special group by on list of strings ending with *

I would like to perform a "special group by" on strings with SQL language, some ending with "*". I use postgresql. I can not clearly formulate this problem, even if I have partially solved it, with select, union and nested queries which are not elegant.

For exemple :

1) INPUT : I have a list of strings :

thestrings
varchar(9)
--------------
1000
1000-0001
1000-0002
2000*
2000-0001
2000-0002
3000*
3000-00*
3000-0001
3000-0002

2) OUTPUT : That I would like my "special group by" return :

1000
1000-0001
1000-0002
2000*
3000*

Because 2000-0001 and 2000-0002 are include in 2000*, and because 3000-00*, 3000-0001 and 3000-0002 are includes in 3000*

3) SQL query I do :

SELECT every strings ending with *
UNION
SELECT every string where the begining  NOT IN  (SELECT every string ending with *)   <-- with multiple inelegant left functions and NOT IN subqueries

4) That what I'm doing return :

1000
1000-0001
1000-0002
2000*
3000*
3000-00* <-- the problem

The problem is : 3000-00* staying in my result.

So my question is : How can I generalize my problem? to remove all string who have a same begining string in the list (ending with *) ? I think of regular expressions, but how to pass a list from a select in a regex ?

Thanks for help.

Upvotes: 1

Views: 72

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 95101

Select only strings for which no master string exists in the table:

select str
from mytable
where not exists 
(
  select *
  from mytable master
  where master.str like '%*'
  and master.str <> mytable.str
  and rtrim(mytable.str, '*') like rtrim(master.str, '*') || '%'
);

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271151

Assuming that only one general pattern can match any given string, the following should do what you want:

select coalesce(tpat.thestring, t.thestring) as thestring
from t left join
     t tpat
     on t.thestring like replace(tpat.thestring, '*', '%') and
        t.thestring <> tpat.thestring
group by coalesce(tpat.thestring, t.thestring);

However, that is not your case. However, you can adjust this with distinct on:

  select distinct on (t.thestring) coalesce(tpat.thestring, t.thestring)
  from t left join
       t tpat
       on t.thestring like replace(tpat.thestring, '*', '%') and
          t.thestring <> tpat.thestring
  order by t.thestring, length(tpat.thestring)

Upvotes: 1

Related Questions