Reputation: 8242
I have a database containing about 300,000 rows of product information.
I need to retrieve rows with duplicate UPCs (COUNT(upc) > 1), where at least one of the results' descriptions matches a certain string ("Reed", for example.)
For example, the following rows would all be selected (desc, upc pairs)
Deer D7394 62226173
Reed R2536 62226173
Deer D7217 62226173
but none of
Deer D0173 62278389
Deer D7289 62278389
Deer D9272 62278389
Here is the query I'm working with:
SELECT a.desc, a.upc, a.sku, a.short_description
FROM inventory a
JOIN
(SELECT upc, desc
FROM inventory
GROUP BY upc
HAVING COUNT(upc) > 1) b
ON a.upc = b.upc
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != ''
AND a.upc != 0
ORDER BY upc;
I am relatively new to MySQL, but this seems like it should work. However, some results are failing to return the non-matching row (i.e. Reed R2536 will be returned, but not Deer D7394).
Any insight would be greatly appreciated!
Upvotes: 2
Views: 904
Reputation: 44192
Brian's group_concat
method will work, when the number of duplicates is small, but it will fail silently when it's not. You'll never know; you'll just be missing rows that should be present.
What you want to do is select all of the UPCs where at least one description matches (and for which duplicates exist), and then, from that list, select all rows matching each of those UPCs.
If you group all items by UPC, then you can annotate each one with a count, and flag whether any of the descriptions matched:
SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
(This takes advantage of the fact that boolean operators, like LIKE
, actually return 0
for false and 1
for true. Taking the maximum of that column tells you whether any row matched)
Then you can filter that list based on your criteria, to get just the UPCs you are interested in:
SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
HAVING desc_matches = 1 AND c > 1
Once you have that list, you want to see all products that match any of those UPCs. You can do that with a simple (not OUTER) join:
SELECT a.desc, a.upc, a.sku, a.short_description
FROM inventory a
JOIN
( SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
HAVING desc_matches = 1 AND c > 1
) b USING (upc)
Upvotes: 4
Reputation: 7991
Another possible way to do this, assuming that you don't have TOO many duplicate records would be:
select * from inventory i
join (
SELECT upc
FROM inventory
GROUP BY upc
HAVING COUNT(upc) > 1
and group_concat(`desc`) like '%reed%') as available_upc
on available_upc.upc = i.upc
This assumes that your tables looks something like:
CREATE TABLE inventory(
sku CHAR(32) NOT NULL,
`desc` CHAR(32) NOT NULL,
upc CHAR(32) NOT NULL,
short_description CHAR(32) NOT NULL,
PRIMARY KEY (sku)
);
insert into inventory values ('D7394','Deer','62226173','Small Deer');
insert into inventory values ('R2536','Reed','62226173','Small Reed');
insert into inventory values ('D7217','Deer','62226173','Large Deer');
insert into inventory values ('D0173','Deer','62278389','Small Deer');
insert into inventory values ('D7289','Deer','62278389','Small Reed');
insert into inventory values ('D9272','Deer','62278389','Large Deer');
Upvotes: 1
Reputation: 4192
Hard to tell without testing, but try:
SELECT a.desc, a.upc, a.sku, a.short_description
FROM inventory a
OUTER RIGHT JOIN
(SELECT upc
FROM inventory
GROUP BY upc
HAVING COUNT(upc) > 1) b
ON a.upc = b.upc
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != ''
AND a.upc != 0
ORDER BY upc;
The key is the OUTER RIGHT JOIN
. Please see the article: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
Also, you only need to return the upc from the inner SELECT
query.
Upvotes: 0