Reputation: 2329
What I want to do is to get all records that have almost exact duplicates except that duplicates don't have an extra char at the beginning of 'name'
this is my sql query:
select * from tags as spaced inner join tags as not_spaced on not_spaced.name = substring(spaced.name, 2);
also I tried:
select * from tags as spaced where (select count(*) from tags as not_spaced where not_spaced.name = substring(spaced.name, 2)) > 0;
What I'm getting is... the SQL connection stops responding. Thanks!
p.s. Sorry I haven't mentioned that the only field I need is name. All other fields are insignificant (if present).
Upvotes: 1
Views: 251
Reputation:
Try something like this:
select
all potentially duplicated fields except name , name
from tags union all
select
all potentially duplicated fields except name , substring(name, 2) name
from tags
group by
all potentially duplicated fields including name
having count(*) > 1
Upvotes: 1
Reputation: 2367
Without knowing the DB, how the tables are indexed, etc, it's just trying different things until one gets better optimized...
Here is another query you can try:
SELECT name, count(*) c FROM (
SELECT name FROM tags
UNION ALL
SELECT substring(name, 2) AS name FROM tags
) AS t
GROUP BY name
Upvotes: 0
Reputation: 86716
Even with an Index, your query will require every record in spaced
to be checked against every record in tags
.
If each table has 1,000 records, that's 1,000,000 combinations.
You may be better off creating a temporary table with just two fields spaced.id, substring(t2.name, 2) as shortname
, then index the shortname field. Joining on that temporary and indexed table will be much much faster.
Upvotes: 0
Reputation: 66697
If the tables are very large, make an index
on name
and substring(name,2)
to make it faster:
select t1.* from tags t1
inner join tags t2 on t1.name = substring(t2.name, 2)
Upvotes: 0