Ramesh
Ramesh

Reputation:

How to find repeated words of a cell in SQL

I have a column named "situation" and "entityid".

Entityid    Situation
1234        In the the world of of
3456        Total universe is is a

Can any one please give me query to find these type of higlihted words.

Thanks Ramesh

Upvotes: 2

Views: 2271

Answers (2)

D'Arcy Rittich
D'Arcy Rittich

Reputation: 171511

If you want to hard code it:

select EntityID, Situation
from Entity
where Situation like '%the the%'
or Situation like '%of of%'
or Situation like '%is is%'

Update: Here is a slightly less hard-coded approach:

select EntityID, Situation, right(s2, diff * 2 + 1) as RepeatedWords
from (
    select EntityID, Situation, WordNumber,
        substring_index(Situation, ' ', WordNumber) s1,
        substring_index(Situation, ' ', WordNumber + 1) s2,
        length(substring_index(Situation, ' ', WordNumber + 1)) - length(substring_index(Situation, ' ', WordNumber)) -1 diff
    from `Entity` e
    inner join (
        select 1 as WordNumber
        union all
        select 2 
        union all
        select 3 
        union all
        select 4 
        union all
        select 5 
        union all
        select 6 
        union all
        select 7 
        union all
        select 8 
        union all
        select 9 
        union all
        select 10 
    ) n
) a
where right(s1, diff) = right(s2, diff)
    and diff > 0
order by EntityID, WordNumber

It will search up to the first 10 words or so, and doesn't handle case, punctuation or multiple spaces properly, but it should give you an idea of an approach you can take. If you want it to handle longer strings, just keep adding to the UNION ALL statements.

Upvotes: 1

Tormod
Tormod

Reputation: 4573

If you are willing to use SQL Server Express, you will be able to create a CLR User Defined Function.

http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx

You will then have the power of Regular Expressions at your finger tips.

Then, depending on your proficiency with RegEx, you're either left with zero problems or two problems.

Upvotes: 0

Related Questions