Reputation: 28454
Let's say I have a database of products that have the following columns:
name
isbn
ean
upc
sku
And I want to group products together if any of those columns match. How can I do that?
I tried using the GROUP BY
clause, but apparently that means that all of the columns have to match, not just a single one.
In other words, GROUP BY name, isbn, ean, upc, sku
only groups products together when all of the columns are equal, where I'm wanting to group them together if any of the columns are equal.
Example:
id | name | isbn | ean | upc | sku
0 banana1 null null 0000000000 1234
1 banana2 6789 null 0000000000 1234
2 banana3 6789 null 1111111111 null
3 banana4 null null 1111111111 null
All of those rows should be grouped together because they have at least one matching column with respect to each other.
Upvotes: 0
Views: 137
Reputation: 1271111
This is a graph traversal problem and painful to implement in SQL -- and especially MySQL which doesn't have support for hierarchical queries.
There is an iterative solution that is not too bad. The idea is to add a column "groupid" and then continually update. The rule for the update is to initially populate it with the id and then assign it the minimum along any of the dimensions. Here is an example of code:
update products
set groupid = id;
/* now run this multiple times until there are no more updates */
update products p
set groupid = (select min(groupid)
from products p2
where p.name = p2.name or
p.isbn = p2.isbn or
p.ean = p2.ean or
p.upc = p2.upc or
p.sku = p2.sku
);
Upvotes: 1
Reputation: 9272
Nope. GROUP BY
requires that all elements in the grouped expression are identical amongst rows in that group. The best I can think of is:
SELECT
*
FROM
table
WHERE
name = isbn
OR name = ean
OR name = upc
OR name = sku
OR isbn = ean
OR isbn = upc
OR isbn = sku
OR ean = upc
OR ean = sku
OR upc = sku
The downside, as you can see, is that this is phenomenally ugly, and doesn't scale well when adding a new column. I would only use this if it was a one-off script--for example, to identify data corruption.
Upvotes: 2