JKL
JKL

Reputation: 27

SQL query to find the total number of books sold for each day by merging 2 tables?

I'm trying to find total books sold (sum of quantities) for each product_id and txn_day. I have 2 tables, transactions and catalog as below:

Table 1: transaction

market_id, txn_day, customer_id, product_id, quantity
1,2019-03-01,1,B0002,1
1,2019-03-01,2,B0003,1
1,2019-03-01,1,B0001,1
3,2019-03-01,3,B0001,1
3,2019-03-01,4,B0002,1
4,2019-03-01,1,B0002,1
4,2019-03-01,5,B0001,1
4,2019-03-01,6,B0001,1

Table 2: catalog

market_id, product_id, title_name
1,B0001, Harry Potter 1
1,B0002, Harry Potter 2
1,B0003, Harry Potter 3
3,B0001, Harry Potter 1
3,B0002, Harry Potter 2
3,B0003, Harry Potter 3
4,B0001, Harry Potter 1
4,B0002, Harry Potter 2
4,B0003, Harry Potter 3

I wrote the following query and got the total books sold (sum of quantities) for product_id:

SELECT 
    transaction.txn_day, transaction.product_id,
    SUM(quantity) AS quantity 
FROM 
    transaction
GROUP BY
    transaction.product_id, transaction.txn_day;

I tried following query to get title_name for each product_id, but it seems incorrect.

SELECT 
    transaction.txn_day, transaction.product_id, catalog.title_name,
    SUM(quantity) AS quantity 
FROM 
    catalog
INNER JOIN 
    transaction ON catalog.product_id = transaction.product_id
GROUP BY
    transaction.txn_day, transaction.product_id, catalog.title_name;

I'm getting the following result:

|txn_day    |product_id |title_name     |quantity
|2019-03-01 |B0002      |Harry Potter 2 |9
|2019-03-01 |B0001      |Harry Potter 1 |12
|2019-03-01 |B0003      |Harry Potter 3 |3

I'm expecting the result to be something like:

|txn_day    | product_id | quantity | title_name
|2019-03-01 | B0003      | 1        | Harry Potter 3
|2019-03-01 | B0002      | 3        | Harry Potter 2
|2019-03-01 | B0001      | 4        | Harry Potter 1

Please suggest any changes required to the query.

Upvotes: 0

Views: 1045

Answers (4)

bunchesNbunches
bunchesNbunches

Reputation: 93

I loaded your data and got the following results:

DATE        ID      TITLE           QTY
3/1/2019    B0002   Harry Potter 2  9
3/1/2019    B0003   Harry Potter 3  3
3/1/2019    B0001   Harry Potter 1  12

Your clue here as to what is wrong is that your Qty is 3x too much for each record.

If you look at your INNER JOIN you are only joining by the product_id.

Take a look at your transaction rows.

You can see that each product_id exists along with 3 different market_ids, that is why your qty is tripling.

Solution: Add the market_id to the JOIN:

SELECT   t.txn_day
        ,t.product_id
        ,c.title_name
        ,SUM(quantity) AS quantity
FROM     catalog c
         INNER JOIN transactions t ON c.product_id = t.product_id AND c.market_id = t.market_id
GROUP BY t.txn_day, t.product_id, c.title_name
order by c.title_name;

You will get your results:

3/1/2019    B0001   Harry Potter 1  4
3/1/2019    B0002   Harry Potter 2  3
3/1/2019    B0003   Harry Potter 3  1

Upvotes: 2

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658042

Your data suggests that product_id alone is the key to title_name. So:

SELECT t.*, c.title_name
FROM  (
   SELECT txn_day, product_id, sum(quantity) AS sum_quantity
   FROM   transaction
   GROUP  BY txn_day, product_id
   ) t
LEFT   JOIN (
   SELECT DISTINCT product_id, title_name
   FROM   catalog
   ) c USING (product_id);

You should really have a table product listing distinct products.
(Unless your sample data is misleading and only the combination (market_id, product_id) is unique - in which case it wouldn't make sense to omit market_id in the aggregation. Hence I am back to my first assumption.)

Barring that, I use the subquery SELECT DISTINCT product_id, title_name FROM catalog to derive that table on the fly.

LEFT JOIN instead of just JOIN is an insurance against eliminating rows if a product_id is not found in table catalog.

Also, it's typically cheaper to aggregate first and join later. See:

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270513

I think you need to join on two columns, not one -- both the market and the product:

SELECT t.txn_day, product_id, c.title_name, SUM(quantity) as quantity
FROM catalog c INNER JOIN
     transaction t
     USING (product_id, market_id)
GROUP BY t.txn_day, product_id, c.title_name;

This uses the USING clause to simplify the JOIN condition. It also introduces table aliases so the query is easier to write and to read.

Upvotes: 1

June7
June7

Reputation: 21389

It's wrong because product_id and title_name are not unique in catalog. Consider:

SELECT transaction.txn_day, transaction.product_id, title_name,
sum(quantity) as qty FROM (SELECT DISTINCT product_id, title_name FROM catalog) AS cat
INNER JOIN transaction ON cat.product_id=transaction.product_id
group by transaction.txn_day, transaction.product_id, catalog.title_name;

Also, a field alias same as native field name might cause issue in some systems, it does in Access.

Upvotes: 1

Related Questions