Reputation: 25259
i have a table like this
TITLE | DESCRIPTION
------------------------------------------------
test1 | value blah blah value
test2 | value test
test3 | test test test
test4 | value test value test
how to select only the rows that contain consequent redundant strings ("blah blah", but not "blah bleh blah")?
the desired output should be only
TITLE | DESCRIPTION
------------------------------------------------
test1 | value blah blah value
test3 | test test test
Upvotes: 0
Views: 56
Reputation: 350770
You could create for this problem (and many other problems) a helper table (just once), which contains natural numbers. It can be used for many purposes:
create table seq (num int);
insert into seq values (1),(2),(3),(4),(5),(6),(7),(8);
insert into seq select num+8 from seq;
insert into seq select num+16 from seq;
insert into seq select num+32 from seq;
insert into seq select num+64 from seq;
/* continue doubling the number of records until you feel you have enough */
Then you can join that table in your query, where each number is used as a sequence number of a word in a phrase. That way you can extract each word and compare it with the next:
select title, description
from phrases
where description not in (
select description
from phrases p
inner join seq
on seq.num <= length(p.description)
- length(replace(p.description,' ',''))
and substring_index(substring_index(
description, ' ', num), ' ', -1)
= substring_index(substring_index(
description, ' ', num+1), ' ', -1)
)
The output for the sample data is:
| title | description |
|-------|-----------------------|
| test2 | value test |
| test4 | value test value test |
Upvotes: 1