meandre
meandre

Reputation: 2329

self join with a self-referring condition

What I want to do is to get all records that have almost exact duplicates except that duplicates don't have an extra char at the beginning of 'name'

this is my sql query:

select * from tags as spaced inner join tags as not_spaced on not_spaced.name = substring(spaced.name, 2);

also I tried:

 select * from tags as spaced where (select count(*) from tags as not_spaced where not_spaced.name = substring(spaced.name, 2)) > 0;

What I'm getting is... the SQL connection stops responding. Thanks!

p.s. Sorry I haven't mentioned that the only field I need is name. All other fields are insignificant (if present).

Upvotes: 1

Views: 251

Answers (4)

user359040
user359040

Reputation:

Try something like this:

select all potentially duplicated fields except name , name

from tags union all

select all potentially duplicated fields except name , substring(name, 2) name

from tags

group by all potentially duplicated fields including name

having count(*) > 1

Upvotes: 1

angus
angus

Reputation: 2367

Without knowing the DB, how the tables are indexed, etc, it's just trying different things until one gets better optimized...

Here is another query you can try:

SELECT name, count(*) c FROM (
    SELECT name FROM tags
    UNION ALL
    SELECT substring(name, 2) AS name FROM tags
) AS t
GROUP BY name

Upvotes: 0

MatBailie
MatBailie

Reputation: 86716

Even with an Index, your query will require every record in spaced to be checked against every record in tags.

If each table has 1,000 records, that's 1,000,000 combinations.


You may be better off creating a temporary table with just two fields spaced.id, substring(t2.name, 2) as shortname, then index the shortname field. Joining on that temporary and indexed table will be much much faster.

Upvotes: 0

aF.
aF.

Reputation: 66697

If the tables are very large, make an index on name and substring(name,2) to make it faster:

select t1.* from tags t1
inner join tags t2 on t1.name = substring(t2.name, 2)

Upvotes: 0

Related Questions