Reputation: 139
I am trying to do a MySQL SELECT with two tables that results in showing columns from each of the tables only when there are multiple rows with two of the columns have duplicated values.
The following SELECT statement gives customer_id, order_id, product_ids and skus for all orders placed by a customer...
select orders2.customer_id, items2.order_id, items2.product_id, items2.sku
from orders orders2, order_items items2
where orders2.status = 'complete'
and orders2.customer_id is not null
and orders2.entity_id = items2.order_id
order by orders2.customer_id ASC, items2.product_id ASC;
Giving me results of...
+-------------+----------+------------+--------------------------+
| customer_id | order_id | product_id | sku |
+-------------+----------+------------+--------------------------+
| 29813 | 38025 | 306 | BB_MAT101 |
| 29813 | 38027 | 309 | BB_MAT250 |
| 29814 | 28844 | 302 | BB_ENG101 |
| 29814 | 27615 | 384 | BB_MS-ACC101 |
| 29814 | 27616 | 385 | BB_MS-ACC102 |
| 29814 | 27615 | 385 | BB_MS-ACC102 |
| 29814 | 27614 | 409 | BB_MS-MAT101 |
| 29814 | 27584 | 410 | BB_MS-MAT150 |
| 29815 | 27592 | 384 | BB_MS-ACC101 |
| 29815 | 27593 | 384 | BB_MS-ACC101 |
| 29815 | 27594 | 384 | BB_MS-ACC101 |
| 29815 | 27599 | 385 | BB_MS-ACC102 |
| 29815 | 27592 | 402 | BB_MS-ECON101 |
| 29815 | 27593 | 402 | BB_MS-ECON101 |
| 29815 | 27594 | 402 | BB_MS-ECON101 |
| 29815 | 27596 | 403 | BB_MS-ECON102 |
| 29815 | 27598 | 404 | BB_MS-ENG099 |
| 29815 | 27588 | 405 | BB_MS-ENG101 |
| 29815 | 27595 | 406 | BB_MS-ENG102 |
| 29815 | 27589 | 408 | BB_MS-MAT099 |
| 29815 | 27585 | 409 | BB_MS-MAT101 |
| 29815 | 27589 | 410 | BB_MS-MAT150 |
| 29815 | 27589 | 411 | BB_MS-MAT201 |
+-------------+----------+------------+--------------------------+
The following SELECT lets me determine those cases where a customer has ordered a product more than once...
select orders1.customer_id as dupe_customer_id, items1.product_id as dupe_product_id, count(*) as duplicates
from orders orders1, order_items items1
where orders1.status = 'complete'
and orders1.customer_id is not null
and orders1.entity_id = items1.order_id
group by orders1.customer_id, items1.product_id
having duplicates > 1;
With results of...
+------------------+-----------------+------------+
| dupe_customer_id | dupe_product_id | duplicates |
+------------------+-----------------+------------+
| 29814 | 385 | 2 |
| 29815 | 384 | 3 |
| 29815 | 402 | 3 |
+------------------+-----------------+------------+
What I am trying to puzzle through is how to combine these so that I only get those items in the FIRST select where they meet the conditions of the second select, such that the output would be something like...
+-------------+----------+------------+--------------------------+
| customer_id | order_id | product_id | sku |
+-------------+----------+------------+--------------------------+
| 29814 | 27616 | 385 | BB_MS-ACC102 |
| 29814 | 27615 | 385 | BB_MS-ACC102 |
| 29815 | 27592 | 384 | BB_MS-ACC101 |
| 29815 | 27593 | 384 | BB_MS-ACC101 |
| 29815 | 27594 | 384 | BB_MS-ACC101 |
| 29815 | 27592 | 402 | BB_MS-ECON101 |
| 29815 | 27593 | 402 | BB_MS-ECON101 |
| 29815 | 27594 | 402 | BB_MS-ECON101 |
+-------------+----------+------------+--------------------------+
Note that the main criteria of the result set is that any combination of customer_id AND product_id MUST occur more than once.
I'm struggling with how to combine them. My attempts have resulted in one (or more) of the columns have incorrect data repeated across every row in the results.
I've hit the wall with my MySQL knowledge on doing this type of SELECT and searches over the past hour have not yielded anything.
Upvotes: 2
Views: 81
Reputation: 139
Here is what I eventually came up with, with the help of PlantTheldea's solution. My solution has additional left joins in place to provide the full report that I was looking for. (They were not relevant to the original question. However, it is easier to just include them here, rather than try to edit them out.)
CREATE TEMPORARY TABLE temp_customer_items
(PRIMARY KEY my_pkey (customer_id, product_id))
SELECT orderSub.customer_id
, itemSub.product_id
FROM orders AS orderSub
INNER JOIN items AS itemSub
ON orderSub.entity_id = itemSub.order_id
WHERE orderSub.entity_id = itemSub.order_id
AND orderSub.customer_id IS NOT NULL
AND orderSub.status = 'complete'
AND itemSub.product_type = 'simple'
GROUP BY orderSub.customer_id
, itemSub.product_id
HAVING COUNT(*) > 1;
SELECT orders.customer_id AS student_id
, fname.value AS first_name
, mname.value AS middle_name
, lname.value AS last_name
, customers.email
, items.product_id
, items.sku AS product_sku
, orders.increment_id AS order_id
, orders.updated_at AS order_date
FROM orders AS orders
LEFT JOIN customers AS customers
ON orders.customer_id = customers.entity_id
INNER JOIN items AS items
ON orders.entity_id = items.order_id
WHERE (orders.customer_id,items.product_id) IN (
SELECT temp_customer_items.customer_id
, temp_customer_items.product_id
FROM temp_customer_items )
ORDER BY last_name ASC
, first_name ASC
, middle_name ASC
, orders.updated_at DESC
, items.order_id ASC;
DROP TEMPORARY TABLE temp_customer_items;
Upvotes: 0
Reputation: 16359
Try giving something like this a try:
SELECT orders.customer_id
,items.order_id
,items.product_id
,items.sku
FROM orders AS orders
INNER JOIN order_items AS items ON orders.entity_id = items.order_id
WHERE (orders.customer_id,items.product_id) IN (
SELECT orderSub.customer_id
,itemSub.product_id
FROM orders AS orderSub
INNER JOIN order_items AS itemSub ON orderSub.entity_id = itemSub.order_id
WHERE orderSub.status = 'complete'
AND orderSub.customer_id IS NOT NULL
GROUP BY orderSub.customer_id
,itemSub.product_id
HAVING COUNT(*) > 1;
);
It is providing the query as a subquery for a multi-column match for the IN clause to work with. This is off the top of my head as I'm not sure if MySQL supports this syntax, but it works in Teradata and DB2.
I also converted your query to maximize ANSI compliance for performance reasons, so if it looks different other than the WHERE IN clause that is why.
Upvotes: 1