Reputation: 33
I'm having some issues when i try to obtain the MAX value of a field withing a set of records and i hope some of you can help me finding what am i doing wrong.
I'm trying to get the ID of the item of the most expensive line, within an order.
Given this query:
SELECT
orderHeader.orderKey, orderLines.lineKey, orderLines.itemKey, orderLines.OrderedQty,
orderLines.price, (orderLines.price*orderLines.OrderedQty) as LinePrice,
ROW_NUMBER() OVER(PARTITION BY orderHeader.orderKey ORDER BY orderLines.lineKey asc) AS [ItemLineNum],
ROW_NUMBER() OVER(PARTITION BY orderHeader.orderKey ORDER BY (orderLines.price*orderLines.OrderedQty) DESC) AS [LineMaxPriceNum],
max(orderLines.itemKey) OVER (PARTITION BY orderHeader.orderKey ORDER BY (orderLines.price*orderLines.OrderedQty) DESC) as [MaxPriceItem]
FROM
orderHeader inner join orderLines on orderHeader.orderKey=orderLines.orderKey
I'm getting this results: Results of Query
Sorry, as i'm not allowed to insert images directly in the post, i'll try with snippets for formatting the tables.
These are the results
| orderKey | lineKey | itemKey | OrderedQty | Price | LinePrice | ItemLineNum | LineMaxPriceNum | MaxPriceItem |
|----------|---------|---------|------------|-------|-----------|-------------|-----------------|--------------|
| 176141 | 367038 | 15346 | 3 | 1000 | 3000 | 2 | 1 | 15346 |
| 176141 | 367037 | 15159 | 2 | 840 | 1680 | 1 | 2 | 15346 |
| 176141 | 367039 | 15374 | 5 | 100 | 500 | 3 | 3 | 15374 |
As you can see, for the same "orderKey" i have three lines (lineKey), each of them with a different item (itemKey), a different quantity, a different price and a different total cost (LinePrice). I want in the column MaxPriceItem the key of the item with the higher "LinePrice", but in the results is wrong. The three lines should show 15346 as the most expensive item but the last one is not right, and i can't see why. Also, the ROW_NUMBER partitioned by the same expression (LineMaxPriceNum) is giving me the right order.
If i change the expression of the ORDER BY within the MAX, like this (ordering by "OrderedQty"):
SELECT
orderHeader.orderKey, orderLines.lineKey, orderLines.itemKey, orderLines.OrderedQty,
orderLines.price, (orderLines.price*orderLines.OrderedQty) as LinePrice,
ROW_NUMBER() OVER(PARTITION BY orderHeader.orderKey ORDER BY orderLines.lineKey asc) AS [ItemLineNum],
ROW_NUMBER() OVER(PARTITION BY orderHeader.orderKey ORDER BY (orderLines.price*orderLines.OrderedQty) DESC) AS [LineMaxPriceNum],
max(orderLines.itemKey) OVER (PARTITION BY orderHeader.orderKey ORDER BY orderLines.OrderedQty DESC) as [MaxPriceItem]
FROM
orderHeader inner join orderLines on orderHeader.orderKey=orderLines.orderKey
Then it works:
| orderKey | lineKey | itemKey | OrderedQty | Price | LinePrice | ItemLineNum | LineMaxPriceNum | MaxPriceItem |
|----------|---------|---------|------------|-------|-----------|-------------|-----------------|--------------|
| 176141 | 367038 | 15346 | 3 | 1000 | 3000 | 2 | 1 | 15374 |
| 176141 | 367037 | 15159 | 2 | 840 | 1680 | 1 | 2 | 15374 |
| 176141 | 367039 | 15374 | 5 | 100 | 500 | 3 | 3 | 15374 |
The item with the highest "OrderedQty" is 15374 so the results are correct.
If i change, again, the expression of the ORDER BY within the MAX, like this (ordering by "Price"):
SELECT
orderHeader.orderKey, orderLines.lineKey, orderLines.itemKey, orderLines.OrderedQty,
orderLines.price, (orderLines.price*orderLines.OrderedQty) as LinePrice,
ROW_NUMBER() OVER(PARTITION BY orderHeader.orderKey ORDER BY orderLines.lineKey asc) AS [ItemLineNum],
ROW_NUMBER() OVER(PARTITION BY orderHeader.orderKey ORDER BY (orderLines.price*orderLines.OrderedQty) DESC) AS [LineMaxPriceNum],
max(orderLines.itemKey) OVER (PARTITION BY orderHeader.orderKey ORDER BY orderLines.price DESC) as [MaxPriceItem]
FROM
orderHeader inner join orderLines on orderHeader.orderKey=orderLines.orderKey
Then it happens the same than with the first example, the results are wrong:
| orderKey | lineKey | itemKey | OrderedQty | Price | LinePrice | ItemLineNum | LineMaxPriceNum | MaxPriceItem |
|----------|---------|---------|------------|-------|-----------|-------------|-----------------|--------------|
| 176141 | 367038 | 15346 | 3 | 1000 | 3000 | 2 | 1 | 15346 |
| 176141 | 367037 | 15159 | 2 | 840 | 1680 | 1 | 2 | 15346 |
| 176141 | 367039 | 15374 | 5 | 100 | 500 | 3 | 3 | 15374 |
The item with the highest price is 15346 but the MAX for the last record is not showing this.
What am i missing here? Why i'm getting those different results?
Sorry if the formatting is not properly done, it's my first question here and i've tried my best.
Thanks in advance for any help you can give me.
Upvotes: 3
Views: 4012
Reputation: 947
The accepted answer provides a reasonable alternate solution to the original problem, but doesn't really explain why the max()
function appears to work inconsistently. (And spoiler alert, you actually can use max()
as originally intended with a small tweak.)
You have to understand that aggregation functions actually operate on a window frame within a partition. By default, the frame is the entire partition. And so aggregation operations like max()
and sum()
do operate over the entire partition, exactly like you assumed. This default specification is defined as RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. This just means that whatever record we're on, max()
looks back all the way to the first row in the partition, and all the way forward to the last row in the partition, in order to calculate the value.
But there's an insidious gotcha: Adding an ORDER BY
clause to the partition changes the the default frame specification to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. This means that whatever record we're on, max()
looks back all the way to the first row in the partition, and then only up to the current row, in order to calculate the value. You can see this clearly in your last example (simplified a bit):
SELECT orderKey, itemKey, price,
ROW_NUMBER() OVER(PARTITION BY orderKey ORDER BY price DESC) AS [PartitionRowNum],
MAX(itemKey) OVER (PARTITION BY orderKey ORDER BY price DESC) as [MaxPriceItem]
FROM orders
Result/explanation:
| orderKey | itemKey | Price | PartitionRowNum | MaxPriceItem | Commentary |
|----------|---------|-------|-----------------|--------------|------------------------|
| 176141 | 15346 | 1000 | 1 | 15346 | Taking max of rows 1-1 |
| 176141 | 15159 | 840 | 2 | 15346 | Taking max of rows 1-2 |
| 176141 | 15374 | 100 | 3 | 15374 | Taking max of rows 1-3 |
We can explicitly indicate the window frame specification by adding RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
to the partition as follows:
SELECT orderKey, itemKey, price,
ROW_NUMBER() OVER(PARTITION BY orderKey ORDER BY price DESC) AS [PartitionRowNum],
MAX(itemKey) OVER (PARTITION BY orderKey ORDER BY price DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as [MaxPriceItem]
FROM orders
Result/explanation:
| orderKey | itemKey | Price | PartitionRowNum | MaxPriceItem | Commentary |
|----------|---------|-------|-----------------|--------------|------------------------|
| 176141 | 15346 | 1000 | 1 | 15374 | Taking max of rows 1-3 |
| 176141 | 15159 | 840 | 2 | 15374 | Taking max of rows 1-3 |
| 176141 | 15374 | 100 | 3 | 15374 | Taking max of rows 1-3 |
Upvotes: 7
Reputation: 222462
I'm trying to get the ID of the item of the most expensive line, within an order.
You misunderstand the purpose of the order by
clause to the window function; it is meant to defined the window frame, not to compare the values; max()
gives you the maximum value of the expression given as argument within the window frame.
On the other hand, you want the itemKey
of the most expensive order line. I think that first_value()
would do what you want:
first_value(orderLines.itemKey) over(
partition by orderHeader.orderKey
order by orderLines.price * orderLines.OrderedQty desc
) as [MaxPriceItem]
Upvotes: 3