Rob Emenecker
Rob Emenecker

Reputation: 139

mysql select for simple join for rows having certain cells with duplicate values

I am trying to do a MySQL SELECT with two tables that results in showing columns from each of the tables only when there are multiple rows with two of the columns have duplicated values.

The following SELECT statement gives customer_id, order_id, product_ids and skus for all orders placed by a customer...

select orders2.customer_id, items2.order_id, items2.product_id, items2.sku 
from orders orders2, order_items items2
where orders2.status = 'complete' 
    and orders2.customer_id is not null
    and orders2.entity_id = items2.order_id
order by orders2.customer_id ASC, items2.product_id ASC;

Giving me results of...

+-------------+----------+------------+--------------------------+
| customer_id | order_id | product_id | sku                      |
+-------------+----------+------------+--------------------------+
|       29813 |    38025 |        306 | BB_MAT101                |
|       29813 |    38027 |        309 | BB_MAT250                |
|       29814 |    28844 |        302 | BB_ENG101                |
|       29814 |    27615 |        384 | BB_MS-ACC101             |
|       29814 |    27616 |        385 | BB_MS-ACC102             |
|       29814 |    27615 |        385 | BB_MS-ACC102             |
|       29814 |    27614 |        409 | BB_MS-MAT101             |
|       29814 |    27584 |        410 | BB_MS-MAT150             |
|       29815 |    27592 |        384 | BB_MS-ACC101             |
|       29815 |    27593 |        384 | BB_MS-ACC101             |
|       29815 |    27594 |        384 | BB_MS-ACC101             |
|       29815 |    27599 |        385 | BB_MS-ACC102             |
|       29815 |    27592 |        402 | BB_MS-ECON101            |
|       29815 |    27593 |        402 | BB_MS-ECON101            |
|       29815 |    27594 |        402 | BB_MS-ECON101            |
|       29815 |    27596 |        403 | BB_MS-ECON102            |
|       29815 |    27598 |        404 | BB_MS-ENG099             |
|       29815 |    27588 |        405 | BB_MS-ENG101             |
|       29815 |    27595 |        406 | BB_MS-ENG102             |
|       29815 |    27589 |        408 | BB_MS-MAT099             |
|       29815 |    27585 |        409 | BB_MS-MAT101             |
|       29815 |    27589 |        410 | BB_MS-MAT150             |
|       29815 |    27589 |        411 | BB_MS-MAT201             |
+-------------+----------+------------+--------------------------+

The following SELECT lets me determine those cases where a customer has ordered a product more than once...

select orders1.customer_id as dupe_customer_id, items1.product_id as dupe_product_id, count(*) as duplicates
from orders orders1, order_items items1
where orders1.status = 'complete' 
    and orders1.customer_id is not null
    and orders1.entity_id = items1.order_id
group by orders1.customer_id, items1.product_id
having duplicates > 1;

With results of...

+------------------+-----------------+------------+
| dupe_customer_id | dupe_product_id | duplicates |
+------------------+-----------------+------------+
|            29814 |             385 |          2 |
|            29815 |             384 |          3 |
|            29815 |             402 |          3 |
+------------------+-----------------+------------+

What I am trying to puzzle through is how to combine these so that I only get those items in the FIRST select where they meet the conditions of the second select, such that the output would be something like...

+-------------+----------+------------+--------------------------+
| customer_id | order_id | product_id | sku                      |
+-------------+----------+------------+--------------------------+
|       29814 |    27616 |        385 | BB_MS-ACC102             |
|       29814 |    27615 |        385 | BB_MS-ACC102             |
|       29815 |    27592 |        384 | BB_MS-ACC101             |
|       29815 |    27593 |        384 | BB_MS-ACC101             |
|       29815 |    27594 |        384 | BB_MS-ACC101             |
|       29815 |    27592 |        402 | BB_MS-ECON101            |
|       29815 |    27593 |        402 | BB_MS-ECON101            |
|       29815 |    27594 |        402 | BB_MS-ECON101            |
+-------------+----------+------------+--------------------------+

Note that the main criteria of the result set is that any combination of customer_id AND product_id MUST occur more than once.

I'm struggling with how to combine them. My attempts have resulted in one (or more) of the columns have incorrect data repeated across every row in the results.

I've hit the wall with my MySQL knowledge on doing this type of SELECT and searches over the past hour have not yielded anything.

Upvotes: 2

Views: 81

Answers (2)

Rob Emenecker
Rob Emenecker

Reputation: 139

Here is what I eventually came up with, with the help of PlantTheldea's solution. My solution has additional left joins in place to provide the full report that I was looking for. (They were not relevant to the original question. However, it is easier to just include them here, rather than try to edit them out.)

CREATE TEMPORARY TABLE temp_customer_items 
    (PRIMARY KEY my_pkey (customer_id, product_id)) 
SELECT orderSub.customer_id 
    , itemSub.product_id 
FROM orders AS orderSub
INNER JOIN items AS itemSub 
    ON orderSub.entity_id = itemSub.order_id 
WHERE orderSub.entity_id = itemSub.order_id 
    AND orderSub.customer_id IS NOT NULL 
    AND orderSub.status = 'complete'  
    AND itemSub.product_type = 'simple' 
GROUP BY orderSub.customer_id 
    , itemSub.product_id 
HAVING COUNT(*) > 1;
SELECT orders.customer_id AS student_id
    , fname.value AS first_name 
    , mname.value AS middle_name 
    , lname.value AS last_name 
    , customers.email 
    , items.product_id 
    , items.sku AS product_sku
    , orders.increment_id AS order_id 
    , orders.updated_at AS order_date
FROM orders AS orders 
LEFT JOIN customers AS customers 
    ON orders.customer_id = customers.entity_id 
INNER JOIN items AS items  
    ON orders.entity_id = items.order_id 
    WHERE (orders.customer_id,items.product_id) IN ( 
        SELECT temp_customer_items.customer_id 
            , temp_customer_items.product_id 
        FROM temp_customer_items )
ORDER BY last_name ASC 
    , first_name ASC 
    , middle_name ASC 
    , orders.updated_at DESC 
    , items.order_id ASC;
DROP TEMPORARY TABLE temp_customer_items; 

Upvotes: 0

PlantTheIdea
PlantTheIdea

Reputation: 16359

Try giving something like this a try:

SELECT orders.customer_id
    ,items.order_id
    ,items.product_id
    ,items.sku 
FROM orders AS orders
INNER JOIN order_items AS items ON orders.entity_id = items.order_id
WHERE (orders.customer_id,items.product_id) IN (
    SELECT orderSub.customer_id
        ,itemSub.product_id
    FROM orders AS orderSub
    INNER JOIN order_items AS itemSub ON orderSub.entity_id = itemSub.order_id
    WHERE orderSub.status = 'complete' 
        AND orderSub.customer_id IS NOT NULL
    GROUP BY orderSub.customer_id
        ,itemSub.product_id
    HAVING COUNT(*) > 1;
);

It is providing the query as a subquery for a multi-column match for the IN clause to work with. This is off the top of my head as I'm not sure if MySQL supports this syntax, but it works in Teradata and DB2.

I also converted your query to maximize ANSI compliance for performance reasons, so if it looks different other than the WHERE IN clause that is why.

Upvotes: 1

Related Questions