Sara
Sara

Reputation: 8242

MySQL: How to retrieve all rows with a duplicate field, matching a LIKE on a certain field in any matched row

I have a database containing about 300,000 rows of product information.
I need to retrieve rows with duplicate UPCs (COUNT(upc) > 1), where at least one of the results' descriptions matches a certain string ("Reed", for example.)

For example, the following rows would all be selected (desc, upc pairs)

Deer D7394    62226173
Reed R2536    62226173
Deer D7217    62226173

but none of

Deer D0173    62278389
Deer D7289    62278389
Deer D9272    62278389

Here is the query I'm working with:

SELECT a.desc, a.upc, a.sku, a.short_description 
FROM inventory a 
JOIN 
    (SELECT upc, desc 
    FROM inventory 
    GROUP BY upc 
    HAVING COUNT(upc) > 1) b 
ON a.upc = b.upc 
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != '' 
AND a.upc != 0 
ORDER BY upc;

I am relatively new to MySQL, but this seems like it should work. However, some results are failing to return the non-matching row (i.e. Reed R2536 will be returned, but not Deer D7394).

Any insight would be greatly appreciated!

Upvotes: 2

Views: 904

Answers (3)

Ian Clelland
Ian Clelland

Reputation: 44192

Brian's group_concat method will work, when the number of duplicates is small, but it will fail silently when it's not. You'll never know; you'll just be missing rows that should be present.

What you want to do is select all of the UPCs where at least one description matches (and for which duplicates exist), and then, from that list, select all rows matching each of those UPCs.

If you group all items by UPC, then you can annotate each one with a count, and flag whether any of the descriptions matched:

SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc

(This takes advantage of the fact that boolean operators, like LIKE, actually return 0 for false and 1 for true. Taking the maximum of that column tells you whether any row matched)

Then you can filter that list based on your criteria, to get just the UPCs you are interested in:

SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
HAVING desc_matches = 1 AND c > 1

Once you have that list, you want to see all products that match any of those UPCs. You can do that with a simple (not OUTER) join:

SELECT a.desc, a.upc, a.sku, a.short_description 
FROM inventory a 
JOIN 
    ( SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
      FROM inventory
      GROUP BY upc
      HAVING desc_matches = 1 AND c > 1
    ) b USING (upc)

Upvotes: 4

Brian Hoover
Brian Hoover

Reputation: 7991

Another possible way to do this, assuming that you don't have TOO many duplicate records would be:

select * from inventory i
  join (
         SELECT upc 
           FROM inventory 
            GROUP BY upc 
            HAVING COUNT(upc) > 1
              and group_concat(`desc`) like '%reed%') as available_upc 
          on available_upc.upc = i.upc

This assumes that your tables looks something like:

CREATE TABLE inventory(
  sku CHAR(32) NOT NULL,
  `desc` CHAR(32) NOT NULL,
  upc CHAR(32) NOT NULL,
  short_description CHAR(32) NOT NULL,
  PRIMARY KEY (sku)
);

insert into inventory values ('D7394','Deer','62226173','Small Deer');
insert into inventory values ('R2536','Reed','62226173','Small Reed');
insert into inventory values ('D7217','Deer','62226173','Large Deer');


insert into inventory values ('D0173','Deer','62278389','Small Deer');
insert into inventory values ('D7289','Deer','62278389','Small Reed');
insert into inventory values ('D9272','Deer','62278389','Large Deer');

Upvotes: 1

Sablefoste
Sablefoste

Reputation: 4192

Hard to tell without testing, but try:

SELECT a.desc, a.upc, a.sku, a.short_description 
FROM inventory a 
OUTER RIGHT JOIN 
    (SELECT upc
    FROM inventory 
    GROUP BY upc 
    HAVING COUNT(upc) > 1) b 
ON a.upc = b.upc 
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != '' 
AND a.upc != 0 
ORDER BY upc;

The key is the OUTER RIGHT JOIN. Please see the article: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

Also, you only need to return the upc from the inner SELECT query.

Upvotes: 0

Related Questions