Reputation: 27
I'm trying to find total books sold (sum of quantities) for each product_id
and txn_day
. I have 2 tables, transactions
and catalog
as below:
Table 1: transaction
market_id, txn_day, customer_id, product_id, quantity
1,2019-03-01,1,B0002,1
1,2019-03-01,2,B0003,1
1,2019-03-01,1,B0001,1
3,2019-03-01,3,B0001,1
3,2019-03-01,4,B0002,1
4,2019-03-01,1,B0002,1
4,2019-03-01,5,B0001,1
4,2019-03-01,6,B0001,1
Table 2: catalog
market_id, product_id, title_name
1,B0001, Harry Potter 1
1,B0002, Harry Potter 2
1,B0003, Harry Potter 3
3,B0001, Harry Potter 1
3,B0002, Harry Potter 2
3,B0003, Harry Potter 3
4,B0001, Harry Potter 1
4,B0002, Harry Potter 2
4,B0003, Harry Potter 3
I wrote the following query and got the total books sold (sum of quantities) for product_id
:
SELECT
transaction.txn_day, transaction.product_id,
SUM(quantity) AS quantity
FROM
transaction
GROUP BY
transaction.product_id, transaction.txn_day;
I tried following query to get title_name
for each product_id
, but it seems incorrect.
SELECT
transaction.txn_day, transaction.product_id, catalog.title_name,
SUM(quantity) AS quantity
FROM
catalog
INNER JOIN
transaction ON catalog.product_id = transaction.product_id
GROUP BY
transaction.txn_day, transaction.product_id, catalog.title_name;
I'm getting the following result:
|txn_day |product_id |title_name |quantity
|2019-03-01 |B0002 |Harry Potter 2 |9
|2019-03-01 |B0001 |Harry Potter 1 |12
|2019-03-01 |B0003 |Harry Potter 3 |3
I'm expecting the result to be something like:
|txn_day | product_id | quantity | title_name
|2019-03-01 | B0003 | 1 | Harry Potter 3
|2019-03-01 | B0002 | 3 | Harry Potter 2
|2019-03-01 | B0001 | 4 | Harry Potter 1
Please suggest any changes required to the query.
Upvotes: 0
Views: 1045
Reputation: 93
I loaded your data and got the following results:
DATE ID TITLE QTY
3/1/2019 B0002 Harry Potter 2 9
3/1/2019 B0003 Harry Potter 3 3
3/1/2019 B0001 Harry Potter 1 12
Your clue here as to what is wrong is that your Qty is 3x too much for each record.
If you look at your INNER JOIN you are only joining by the product_id.
Take a look at your transaction rows.
You can see that each product_id exists along with 3 different market_ids, that is why your qty is tripling.
Solution: Add the market_id to the JOIN:
SELECT t.txn_day
,t.product_id
,c.title_name
,SUM(quantity) AS quantity
FROM catalog c
INNER JOIN transactions t ON c.product_id = t.product_id AND c.market_id = t.market_id
GROUP BY t.txn_day, t.product_id, c.title_name
order by c.title_name;
You will get your results:
3/1/2019 B0001 Harry Potter 1 4
3/1/2019 B0002 Harry Potter 2 3
3/1/2019 B0003 Harry Potter 3 1
Upvotes: 2
Reputation: 658042
Your data suggests that product_id
alone is the key to title_name
. So:
SELECT t.*, c.title_name
FROM (
SELECT txn_day, product_id, sum(quantity) AS sum_quantity
FROM transaction
GROUP BY txn_day, product_id
) t
LEFT JOIN (
SELECT DISTINCT product_id, title_name
FROM catalog
) c USING (product_id);
You should really have a table product
listing distinct products.
(Unless your sample data is misleading and only the combination (market_id, product_id)
is unique - in which case it wouldn't make sense to omit market_id
in the aggregation. Hence I am back to my first assumption.)
Barring that, I use the subquery
SELECT DISTINCT product_id, title_name FROM catalog
to derive that table on the fly.
LEFT JOIN
instead of just JOIN
is an insurance against eliminating rows if a product_id
is not found in table catalog
.
Also, it's typically cheaper to aggregate first and join later. See:
Upvotes: 1
Reputation: 1270513
I think you need to join on two columns, not one -- both the market and the product:
SELECT t.txn_day, product_id, c.title_name, SUM(quantity) as quantity
FROM catalog c INNER JOIN
transaction t
USING (product_id, market_id)
GROUP BY t.txn_day, product_id, c.title_name;
This uses the USING
clause to simplify the JOIN
condition. It also introduces table aliases so the query is easier to write and to read.
Upvotes: 1
Reputation: 21389
It's wrong because product_id and title_name are not unique in catalog. Consider:
SELECT transaction.txn_day, transaction.product_id, title_name,
sum(quantity) as qty FROM (SELECT DISTINCT product_id, title_name FROM catalog) AS cat
INNER JOIN transaction ON cat.product_id=transaction.product_id
group by transaction.txn_day, transaction.product_id, catalog.title_name;
Also, a field alias same as native field name might cause issue in some systems, it does in Access.
Upvotes: 1