Павел Иванов
Павел Иванов

Reputation: 1154

Optimize sql query to get duplicates

A have a following sql query:

SELECT users.* FROM users users

WHERE users.name <> '' and users.email <> '' and users.phone <> ''

and users.name in (  SELECT name
            FROM users
                where name <> '' and name is not null
            GROUP BY name
            HAVING count(name) > 1 )
and users.email in (  SELECT email
            FROM users
                where email <> '' and email is not null
            GROUP BY email
            HAVING count(email) > 1 )
and users.phone in (  SELECT phone
            FROM users
                where phone <> '' and phone is not null
            GROUP BY phone
            HAVING count(phone) > 1 )
ORDER BY users.name+users.email+users.phone ASC
LIMIT 0,200

Which unfortunately run very slow on huge database. Is there any option to optimize this query?

Idea for query result: get all records that has duplicates in database (for example get users with same name+same phone+same email

I tried with inner join but seems not work correctly

Upvotes: 1

Views: 306

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270081

If you want users with the same name, phone, and email, then use group by:

select u.name, u.phone, u.email, group_concat(u.user_id)
from users u
group by u.name, u.phone, u.email
having count(*) > 1;

If you want all rows, rather than just the ids in a list, then use join:

select u.*
from (select u.name, u.phone, u.email
      from users u
      group by u.name, u.phone, u.email
      having count(*) > 1
     ) udup join
     users u
     on u.name = udup.name and u.phone = udup.phone and u.email = udup.email
order by u.name, u.phone, u.email;

Note: These queries do not do what your original query does. Instead, it is based on the logic you describe in the text ("for example get users with same name+same phone+same email").

Upvotes: 4

Related Questions