gustyaquino
gustyaquino

Reputation: 767

Adding multiple conditions to MySQL Inner Join

the goal of the query is also to find possible duplicates of names that were mistyped. Example:

International Group Inc. must be find as a duplicate of International, Group Inc

In order to accomplish this a used the next query:

SELECT C.id,
       C.name,
       C.address,
       C.city_id
FROM   company C
       INNER JOIN (SELECT name
                   FROM   company
                   GROUP  BY name
                   HAVING Count(id) > 1) D
               ON Replace(Replace(C.name, '.', ''), ',', '') =
                  Replace(Replace(D.name, '.', ''), ',', '')  

It works very well and the result came at 40 secs but adding an extra condition like AND C.city_id='4' requires an extra minute or more; This is still acceptable but not preferable.

My real problem occurs when I try to add another condition to find out only duplicates of companies that have a specific string in the name, using this condition AND C.name LIKE '%International%', this just don't return any results.

Could somebody help me figure out what I am doing wrong?

Thanks

Upvotes: 5

Views: 1370

Answers (2)

RandomSeed
RandomSeed

Reputation: 29759

Because you are joining on the result of a function, the query cannot use any index. Besides, the cost of executing the REPLACE() on all rows is probably not negligible.

I suggest you first add an indexed column that receives the "stripped-down" version of the strings, and then run the query with a join on this column:

ALTER TABLE company ADD COLUMN stripped_name VARCHAR(50);
ALTER TABLE company ADD INDEX(stripped_name);
UPDATE TABLE company SET stripped_name = REPLACE(REPLACE(name, '.', ''), ',', '') ;

Running the UPDATE could take a while the first time, but you could also set an ON UPDATE and an ON INSERT triggers on company so that stripped_name gets populated and update on-the-fly.

Upvotes: 6

Stephan
Stephan

Reputation: 8090

Try starting from the tmp table because foreach row in company a tmp table will be created :

SELECT C.id,
       C.name,
       C.address,
       C.city_id
FROM (SELECT name
                   FROM   company
                   GROUP  BY name
                   HAVING Count(id) > 1) D  
INNER JOIN company C       
               ON Replace(Replace(C.name, '.', ''), ',', '') =
                  Replace(Replace(D.name, '.', ''), ',', '')  

Upvotes: 0

Related Questions