Devesh Agrawal
Devesh Agrawal

Reputation: 9212

Mysql left join is not working as expected

3 tables.

table_customers - customer_id, name
table_orders - order_id, customer_id, order_datetime
table_wallet - customer_id, amount, type  // type 1- credit, type 2- debit

Need to get all customers, their total balance, and their last order date and order id. If customer have not placed any return order date as 0000-00-00 and order id as 0.

This is my query.

SELECT 
C.customer_id, 
C.name, 
COALESCE( SUM(CASE WHEN type = 2 THEN -W.amount ELSE W.amount END), 0) AS value,
COALESCE( max( O.order_id  ) , '0' ) AS last_order_id, 
COALESCE( max( date( O.order_datetime ) ) , '0000-00-00' ) AS last_order_date
FROM 
table_customers as C 
LEFT JOIN 
table_wallet as W 
ON C.customer_id = W.customer_id 
LEFT JOIN
table_orders AS O
ON W.customer_id = O.customer_id
group by C.customer_id
ORDER BY C.customer_id

Everything is coming correct except customer's value. From result it seems its getting added multiple times.

I have created the fiddle here. http://sqlfiddle.com/#!9/560f2/1

What is wrong in query? Can anyone help me on this?

Edit: Expected result

customer_id name    value   last_order_id     last_order_date
  1         abc     20        3             2016-06-22
  2         def     112.55    0             0000-00-00
  3         pqrs      0       4             2016-06-15
  4         wxyz      0       0             0000-00-00

Upvotes: 1

Views: 54

Answers (3)

O. Jones
O. Jones

Reputation: 108651

This is the classic combinatorial explosion problem when you JOIN tables containing unrelated data.

You need to compute each customer's balance in a subquery. That subquery must yield either one row or zero rows per customer_id. It might look like this. (http://sqlfiddle.com/#!9/560f2/8/0)

      SELECT customer_id, 
             SUM(CASE WHEN type = 2 THEN -amount ELSE amount END) AS value
        FROM table_wallet
       GROUP BY customer_id

Similarly, you need to retrieve each customer's latest order in a subquery (http://sqlfiddle.com/#!9/560f2/10/0) . Again, it needs either one row or zero rows per customer_id.

      SELECT customer_id,
             MAX(order_id) AS order_id,
             DATE(MAX(order_datetime)) AS order_date
        FROM table_orders
       GROUP BY customer_id

Then, you can LEFT JOIN those two subqueries as if they were tables, to your table_customers. The subqueries are tables; they're virtual tables. (http://sqlfiddle.com/#!9/560f2/12/0)

SELECT c.customer_id,
       c.name,
       w.value,
       o.order_id,
       o.order_date
  FROM table_customers c
  LEFT JOIN (
           SELECT customer_id, 
                  SUM(CASE WHEN type = 2 THEN -amount ELSE amount END) AS value
             FROM table_wallet
            GROUP BY customer_id
       ) w ON c.customer_id = w.customer_id
  LEFT JOIN (
           SELECT customer_id,
                  MAX(order_id) AS order_id,
                  DATE(MAX(order_datetime)) AS order_date
             FROM table_orders
            GROUP BY customer_id
       ) o ON c.customer_id = o.customer_id 

Your mistake was this: you joined two tables each with multiple rows for each customer id. For example, a particular customer might have had two orders and three wallet rows. Then, the join results in six rows representing all the possible combinations of wallet and order rows. That's called combinatorial explosion.

The solution I outlined makes sure there's only one row (or maybe no rows) to join for each customer_id, and so eliminates the combinatorial explosion.

Pro tip: Using subqueries like this makes it easy to test your query: you can test each subquery separately.

Upvotes: 2

Nick
Nick

Reputation: 679

To further illustrate from the previous answers, if we simply remove your group by statement, you can easily see why you are double counting. The following code:

SELECT 
C.*,
O.order_id, O.order_datetime,
W.amount, W.type
FROM 
table_customers as C 
LEFT JOIN 
table_wallet as W 
ON C.customer_id = W.customer_id 
LEFT JOIN
table_orders AS O
ON W.customer_id = O.customer_id

Will yield the result:

customer_id   name  order_id    order_datetime           amount   type
1             abc   1           April, 22 2016 23:53:09  20       1
1             abc   2           May, 22 2016 23:53:09    20       1
1             abc   3           June, 22 2016 23:53:09   20       1
2             def   (null)      (null)                   100      1
2             def   (null)      (null)                   12.55    1
3             pqrs  (null)      (null)                   (null)   (null)
4             wxyz  (null)      (null)                   (null)   (null)

Note the duplication of Customer ID 1 with amount 20.

Upvotes: 2

jpw
jpw

Reputation: 44881

The issue is that the join between orders and wallet will produce as many rows as there as orders for each wallet, when you really just want one row per wallet from the order table (since you only use the max values). In your test case you get 3 rows for customer 1 which makes the sum 60 (3*20).

One way to solve this is to change to this:

SELECT 
  C.customer_id, 
  C.name, 
  COALESCE( SUM(CASE WHEN type = 2 THEN -W.amount ELSE W.amount END), 0) AS value,
  COALESCE( O.order_id , '0' ) AS last_order_id, 
  COALESCE( DATE( O.order_datetime ) , '0000-00-00' ) AS last_order_date
FROM table_customers AS C 
LEFT JOIN table_wallet AS W ON C.customer_id = W.customer_id 
LEFT JOIN (
  SELECT 
    customer_id, 
    MAX(order_id) AS order_id, 
    MAX(order_datetime) AS order_datetime
  FROM table_orders 
  GROUP BY customer_id
) AS O ON c.customer_id = O.customer_id
GROUP BY C.customer_id
ORDER BY C.customer_id

As you see the orders table is replaced by a derived table that gets you one row per customer.

Running the query above gets you the following result:

| customer_id | name |  value | last_order_id | last_order_date |
|-------------|------|--------|---------------|-----------------|
|           1 |  abc |     20 |             3 |      2016-06-22 |
|           2 |  def | 112.55 |             0 |      0000-00-00 |
|           3 | pqrs |      0 |             4 |      2016-06-15 |
|           4 | wxyz |      0 |             0 |      0000-00-00 |

Upvotes: 2

Related Questions