Reputation: 79
In the query below I'm trying to select data but need it to be distinct on the columns Book.Title and the OrderDate. I have tried using DISTINCT ON but also get multiple results back which I will also post below. Sorry if the question is a bit trivial. I'm currently in the process of learning SQL so I'm a bit of a noob when it comes to what is probably something really obvious I have missed.
How could I change the query to get the results to be formatted as so:
month | title | quantity | total_value
-----------+---------------------------------------------+----------+-------------
February | Internet and World Wide Web: How to Program | 15 | 899.70
March | C How To Program | 4 | 183.92
March | Core Servlets and JavaServer Pages | 13 | 856.70
March | Internet and World Wide Web: How to Program | 21 | 1071.58
The query I constructed:
SELECT DISTINCT ON (OrderDate, Book.Title) Book.Title, to_char(OrderDate, 'Month') AS "Order Date",
OrderLine.Quantity AS "Order Quantity", (SUM(Quantity*UnitSellingPrice)) AS "Total Value"
FROM Book
INNER JOIN Publisher
ON Book.PublisherID=Publisher.PublisherID
INNER JOIN OrderLine
ON Book.BookID=OrderLine.BookID
INNER JOIN ShopOrder
ON OrderLine.ShopOrderID=ShopOrder.ShopOrderID
WHERE Publisher.Name='Prentice Hall'
GROUP BY book.title, OrderDate, orderline.quantity
ORDER BY OrderDate ASC;
The results I get:
title | Order Date | Order Quantity | Total Value
---------------------------------------------+------------+----------------+-------------
Internet and World Wide Web: How to Program | February | 10 | 299.90
Internet and World Wide Web: How to Program | February | 5 | 149.95
C How To Program | March | 3 | 68.97
Core Servlets and JavaServer Pages | March | 10 | 329.50
Internet and World Wide Web: How to Program | March | 20 | 519.80
C How To Program | March | 1 | 22.99
Core Servlets and JavaServer Pages | March | 3 | 98.85
Internet and World Wide Web: How to Program | March | 1 | 15.99
Thanks so much for any help!
Upvotes: 0
Views: 767
Reputation: 61865
To see what is going on, consider:
SELECT DISTINCT ON (OrderDate, Book.Title)
Book.Title,
OrderDate as "Actual Order Date",
to_char(OrderDate, 'Month') AS "Order Month", ..
That is, the result in the post SELECTs a different "Order Date", really the order month, but is not the actual OrderDate
value used in the DISTINCT/GROUP BY process which explains both the duplicates and incorrect aggregate results.
I suspect the desired query goes something like the following, and I've yet to find a need to use DISTINCT with GROUP BY.
SELECT
-- These are in the GROUP BY and will be DISTINCT in the result
Book.Title,
to_char(OrderDate, 'Month') AS "Order Month",
-- Non GROUP BY values should be aggregated;
-- In this case the aggregates are over Title/Month
SUM(OrderLine.Quantity) AS "Order Quantity",
SUM(Quantity*UnitSellingPrice) AS "Total Value"
FROM Book
INNER JOIN Publisher
ON Book.PublisherID=Publisher.PublisherID
INNER JOIN OrderLine
ON Book.BookID=OrderLine.BookID
INNER JOIN ShopOrder
ON OrderLine.ShopOrderID=ShopOrder.ShopOrderID
WHERE Publisher.Name='Prentice Hall'
-- For each Title/Month
GROUP BY book.title,
to_char(OrderDate, 'Month') -- also grouped on Month
ORDER BY to_char(OrderDate, 'Month') ASC; -- and sorted on Month
Upvotes: 1
Reputation: 97638
I think you're over-complicating things here: if you want the total quantity for each title and month, then you want to group by the title and month, and sum the quantity. You don't want to group by the quantity itself, because that would mean a new row for each distinct quantity.
That's easy to do in standard SQL (DISTINCT ON
is a Postgres extension, useful in a few cases instead of GROUP BY
, but unnecessary here):
SELECT
Book.Title,
to_char(OrderDate, 'Month') AS "Order Date",
SUM(OrderLine.Quantity) AS "Order Quantity",
SUM(Quantity*UnitSellingPrice) AS "Total Value"
FROM [...]
WHERE Publisher.Name='Prentice Hall'
GROUP BY
book.title,
to_char(OrderDate, 'Month')
ORDER BY to_char(OrderDate, 'Month') ASC;
The rule of thumb is that everything in your SELECT
clause - and your ORDER BY
clause, which is considered to be "at the same level" - should be either:
GROUP BY
clause because you want a new row every time it variesSUM()
) because you want to "roll it up" according to some rule.Upvotes: 2