Reputation: 1176
I'm currently using redshift. I was trying to execute a query to calculate a column called id_number
(data type INTEGER) from a VARCHAR column called id
to speed up further queries using id_number
instead of id
.
Here is the first query I tried :
select rank() over (order by id) id_number, id, sid1 ,sid2
from table
limit 10000
However, noticing that this query was taking quite some time, I tried the next query:
with A as(
select id, sid1, sid2
from table
limit 10000
)
select rank() over (order by id) id_number, id, sid1 ,sid2
from A
which was over in a flash.
How was it that the second query took such less time to execute, while the two queries seem to do the exact same thing?
If it is because of the positions of limit 10000
, how did the position of limit
contribute to the difference in execution time?
Upvotes: 1
Views: 95
Reputation: 247545
Your two queries are quite different.
The first one has to sort the complete table to get the rank()
and then emits the first 10000 rows of the result (with no particular ordering enforced).
The second one selects 10000 rows (without a particular ordering enforced) and then sorts those to calculate rank()
on them.
If the table is significantly larger than 10000 rows, it is unsurprising that the first query, which has to sort it all, is much slower.
Look at the EXPLAIN (ANALYZE, BUFFERS)
output to understand this better.
Upvotes: 2