Reputation: 2067
I have a list of strings in my database let say in a column
understand
understan
understa
underst
unders
under
I'm trying to find out How to delist subset strings if it is a substring of another string with sql?
So I if we pretend that this is a column of my table, the end result must be only
understand
Upvotes: 1
Views: 63
Reputation: 173028
Below is for BigQuery Standard SQL
#standardSQL
SELECT str FROM (
SELECT str,
STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag
FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)
I tested above with dummy data similar to what you provided in your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'anderstand' str UNION ALL
SELECT 'anderstan' UNION ALL
SELECT 'andersta' UNION ALL
SELECT 'anderst' UNION ALL
SELECT 'understand' str UNION ALL
SELECT 'understan' UNION ALL
SELECT 'understa' UNION ALL
SELECT 'underst' UNION ALL
SELECT 'unders' UNION ALL
SELECT 'under'
)
SELECT str FROM (
SELECT str,
STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag
FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)
with result
Row str
1 understand
2 anderstand
which I believe is exactly what you expected
Upvotes: 3
Reputation: 1270021
To solve this problem, I would recommend lead()
:
select t.*
from (select t.*,
lead(str) over (order by str) as next_str
from t
) t
where next_str not like concat(str, '%') or
next_str is not null;
Upvotes: 0
Reputation: 1285
In Oracle a slow method would be:
with b as(
select substr('understand',1,level)w from dual connect by level <= 10)
union all
select substr('asdfasdfad',1,level)w from dual connect by level <= 10
,chk as(
select s.w p,t.w f , substr(t.w,1,length(t.w)-1)
from b s,b t
where s.w like substr(t.w,1,length(t.w)-1) || '%'
and length( s.w) > length(t.w))
select w from b
minus
select f from chk
Upvotes: 0