SNaRe
SNaRe

Reputation: 2067

How to delist subset strings if it is a substring of another string

I have a list of strings in my database let say in a column

understand
understan
understa
underst
unders
under

I'm trying to find out How to delist subset strings if it is a substring of another string with sql?

So I if we pretend that this is a column of my table, the end result must be only

understand

Upvotes: 1

Views: 63

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Below is for BigQuery Standard SQL

#standardSQL
SELECT str FROM (
  SELECT str, 
    STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag 
  FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)   

I tested above with dummy data similar to what you provided in your question

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' str UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' 
)
SELECT str FROM (
  SELECT str, 
    STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag 
  FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)    

with result

Row str  
1   understand   
2   anderstand   

which I believe is exactly what you expected

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1270021

To solve this problem, I would recommend lead():

select t.*
from (select t.*,
             lead(str) over (order by str) as next_str
      from t
     ) t
where next_str not like concat(str, '%') or
      next_str is not null;

Upvotes: 0

Thomas Strub
Thomas Strub

Reputation: 1285

In Oracle a slow method would be:

with b as(
     select substr('understand',1,level)w from dual connect by level <= 10)
     union all 
     select substr('asdfasdfad',1,level)w from dual connect by level <= 10
,chk as( 
    select s.w p,t.w f , substr(t.w,1,length(t.w)-1)
    from b s,b t
    where s.w like substr(t.w,1,length(t.w)-1) ||  '%'
    and length( s.w) > length(t.w))
select w from  b
minus 
select f from chk

Upvotes: 0

Related Questions