Rahul Gupta
Rahul Gupta

Reputation: 1802

Select max(sort_key) from tbl_5billion_rows taking too long

I have redshift table with 5 billion rows which is going to grow alot in near future. When I run a simple query

select max(sort_key) from tbl

it takes 30 sec.
I have only one sort key in the table.
I have run vacuum and analyze on the table recently.
The reason I am worried about 30 sec is, I use max(sort_key) multiple times in my subquery.
Is there anything I am missing?

Output Explain select max(sort_key) from tbl

XN Aggregate  (cost=55516326.40..55516326.40 rows=1 width=4)
  ->  XN Seq Scan on tbl  (cost=0.00..44413061.12 rows=4441306112 width=4)



Output Explain select sort_key from tbl order by sortkey desc limit 1

XN Limit  (cost=1000756095433.11..1000756095433.11 rows=1 width=4)
->  XN Merge  (cost=1000756095433.11..1000767198698.39 rows=4441306112 width=4)
        Merge Key: sort_key
        ->  XN Network  (cost=1000756095433.11..1000767198698.39 rows=4441306112 width=4)
              Send to leader
              ->  XN Sort  (cost=1000756095433.11..1000767198698.39 rows=4441306112 width=4)
                    Sort Key: sort_key
                    ->  XN Seq Scan on tbl  (cost=0.00..44413061.12 rows=4441306112 width=4)

Upvotes: 0

Views: 337

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269171

Finding the MAX() of a value requires Amazon Redshift to look through every value in the column. It probably isn't smart enough to realise that the MAX of the Sortkey is right at the end.

You could speed it up by helping the query use Zone Maps, which identify the range of values stored in each block.

If you know that the maximum sortkey is above a particular value, include that in the WHERE clause, eg:

SELECT MAX(sort_key) FROM tbl WHERE sort_key > 50000;

This will dramatically reduce the number of blocks that Redshift needs to retrieve from disk.

Upvotes: 1

Related Questions