Reputation: 45
I have schema (Oracle 11g R2):
CREATE TABLE users (
id INT NOT NULL,
name VARCHAR(30) NOT NULL,
num int NOT NULL
);
INSERT INTO users (id, name, num) VALUES (1,'alan',5);
INSERT INTO users (id, name, num) VALUES (2,'alan',4);
INSERT INTO users (id, name, num) VALUES (3,'julia',10);
INSERT INTO users (id, name, num) VALUES (4,'maros',77);
INSERT INTO users (id, name, num) VALUES (5,'alan',1);
INSERT INTO users (id, name, num) VALUES (6,'maros',14);
INSERT INTO users (id, name, num) VALUES (7,'fero',1);
INSERT INTO users (id, name, num) VALUES (8,'matej',8);
INSERT INTO users (id, name, num) VALUES (9,'maros',55);
And i execute following queries - using LAST_VALUE
analytic function only with ORDER BY
analytic clause :
My assumption is that this query executes over one partition - whole table (as partition by clause is missing). It will sort rows by name in given partition (whole table) and it will use default windowing clause RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
select us.*,
last_value(num) over (order by name) as lv
from users us;
But the query executed above will give exactly same results as following one. My assumption concerning second query is that this query firstly partition table rows by name then sort rows in every partition by num and then apply windowing clause RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
over each partition to get LAST_VALUE
.
select us.*,
last_value(num) over (partition by name order by num RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lv
from users us;
One of my assumption is clearly wrong because two above mentioned queries give the same result. It looks like the first query orders records also by num behind curtains. Could you please suggest what is wrong with my assumptions and why these queries return same results?
Upvotes: 1
Views: 3148
Reputation: 1269445
Here is a db<>fiddle, in case anyone wants to play with them.
Let me assume that you think that the second query is returning the correct results.
select us.*,
last_value(num) over (partition by name
order by num
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lv
from users us;
Let me also point out that this is more succinctly written as:
select us.*,
max(num) over (partition by name
order by num
) as lv
from users us;
That is irrelevant to your question, but I want to point it out.
Now, why does this give the same results?
select us.*,
last_value(num) over (order by name) as lv
from users us;
Well, with no windowing clause, this is equivalent to:
select us.*,
last_value(num) over (order by name
range between unbounded preceding and current row
) as lv
from users us;
The range
is very important here. It does not go to the current row. It goes to all rows with the same value in name
.
In my understanding of the documentation around order by
, any num
value from rows with the same name could be chosen. Why? Sorting in SQL (and in Oracle) is not stable. That means that it is not guaranteed to preserve the original ordering of the rows.
In this particular case, it might be coincidence that the last value happens to be the largest value. Or, for some reason Oracle might be adding num
to the ordering for some reason.
Upvotes: 1
Reputation:
The answer is simple. For whatever reason, Oracle chose to make LAST_VALUE
deterministic when a logical (RANGE
) offset is used in the windowing clause (explicitly or implicitly - by default). Specifically, in such cases, the HIGHEST value of the measured expression is selected from among a set of rows tied by the order by
sorting.
Towards the bottom of that page in the Oracle documentation, we can read:
When duplicates are found for the
ORDER BY
expression, theLAST_VALUE
is the highest value of expr [...]
Why does the documentation say that in the examples section, and not in the explanation of the function? Because, as is very often the case, the documentation doesn't seem to be written by qualified people.
Upvotes: 2
Reputation: 520878
From this blog in Oracle magazine, here is what happens if you use an ORDER BY
clause in a window function without specifying anything else:
An ORDER BY clause, in the absence of any further windowing clause parameters, effectively adds a default windowing clause: RANGE UNBOUNDED PRECEDING, which means, “The current and previous rows in the current partition are the rows that should be used in the computation.” When an ORDER BY clause isn’t accompanied by a PARTITION clause, the entire set of rows used by the analytic function is the default current partition.
So, your first query is actually the same as this:
SELECT us.*, LAST_VALUE(num) OVER (ORDER BY name RANGE UNBOUNDED PRECEDING) AS lv
FROM users us;
If you run the above query, you will get the current behavior you are seeing, which will return a separate last value for each name. This differs from the following query:
SELECT
us.*,
LAST_VALUE(num) OVER (ORDER BY name
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
FROM users us;
This just generates the value 8
for the last value of num, which corresponds to the value for matej
, who is the last name when sorting name ascending.
Upvotes: 1