Debora Martins
Debora Martins

Reputation: 243

How to find records that have any duplicate data using Active Record

How to find records with duplicate values in any column using Activerecord or SQL?

SELECT leads.id, leads.name, leads.email, leads.created_at, array_agg(tn2.id) as ids
FROM "leads" join leads tn2
    on leads.name = tn2.name
      or leads.cpf_cnpj = tn2.cpf_cnpj
      or leads.email = tn2.email
      or leads.phone -> 'cellphone' = tn2.phone -> 'cellphone'
      or leads.phone -> 'residence' = tn2.phone -> 'residence'
      or leads.phone -> 'commercial' = tn2.phone -> 'commercial' 
GROUP BY leads.id  ORDER BY leads.created_at DESC

Using array_agg I want only ids from repeated objects, but it gives me from all records. Currently, I'm using PostgreSQL.

Upvotes: 1

Views: 232

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656331

How to find records with duplicate values in any column?

SELECT l.id, l.name, l.email, l.created_at, array_agg(l2.id) AS ids
FROM   leads l
WHERE EXISTS (
   SELECT 1
   FROM   leads 
   WHERE  id <> l.id 
   AND   (
          name = l.name    
   OR     cpf_cnpj = l.cpf_cnpj
   OR     email = l.email
   OR     phone->'cellphone'  = l.phone->'cellphone'
   OR     phone->'residence'  = l.phone->'residence'
   OR     phone->'commercial' = l.phone->'commercial'
         )
   );

But it seems like you want something different:

How to get an array of IDs for each row from rows that have the same value in at least one of several given columns, youngest entry first?

SELECT l.id, l.name, l.email, l.created_at
     , array_agg(l2.id  ORDER BY l2.created_at DESC NULL LAST) AS dupe_ids
FROM   leads l
JOIN   leads l2 ON l2.id <> l.id 
      AND   (
              l2.name = l.name    
       OR     l2.cpf_cnpj = l.cpf_cnpj
       OR     l2.email = l.email
       OR     l2.phone->'cellphone'  = l.phone->'cellphone'
       OR     l2.phone->'residence'  = l.phone->'residence'
       OR     l2.phone->'commercial' = l.phone->'commercial'
             )
GROUP   BY l.id
ORDER   BY l.created_at DESC NULL LAST;

Assuming id is the primary key.

Upvotes: 1

Related Questions