Ram
Ram

Reputation: 570

Why is MIN($column) doing an Index scan instead of seeking to min value?

I have a table with more than a billion rows, with one of the column definitions being:

row_id int NOT NULL

And an index on it

 CREATE NONCLUSTERED INDEX row_id_IX ON dbo.test_table (  row_id ASC  )  

I want to find the minimum value of row_id in the table. So I write

select min(row_id) from test_table 

Given this index is a B+ tree in SQL Server, and the values are NOT-nullable, I expect the query plan to recursively seek towards left child of the root node of B+ tree, until it finds the minimum value. But instead, the planner decides to do an Index scan and ends up reader a billion values followed by an aggregation to find the min value.

Why is that?

enter image description here

row_id_IX is a partitioned index.

Upvotes: 1

Views: 117

Answers (1)

Martin Smith
Martin Smith

Reputation: 453736

A solution to this is to get the MIN per partition (which should just require reading the single row at the end of the index from each) and then get the MIN of those.

One query approach for this (covered by Itzik Ben Gan here) is

SELECT MIN(m.row_id) AS row_id
FROM   sys.partitions AS P
       CROSS APPLY (SELECT MIN(row_id) AS row_id
                    FROM  dbo.test_table
                    WHERE  $PARTITION.YourPartitionFunction(YourPartitionColumn)  = P.partition_number) AS m
WHERE  P.object_id = OBJECT_ID('dbo.test_table')
       AND P.index_id <= 1; 

You haven't told us the name of your partition function or partitioning column so replace YourPartitionColumn and YourPartitionFunction as appropriate above (and remember to replace both references to test_table when applying it against a different table name).

The execution plan for this should have the partition numbers returned from sysrowsets on the outside of a nested loops and on the inside of this an ordered index scan with a seek predicate to seek into just the single partition of interest. This should be under a TOP 1 operator to stop reading from the scan after the first row is read.

If you don't see the above plan (e.g. as the column you are aggregating is nullable) a more verbose alternative is

SELECT MIN(m.row_id) AS row_id
FROM   sys.partitions AS P
       CROSS APPLY (SELECT TOP 1 row_id
                    FROM  dbo.test_table
                    WHERE  $PARTITION.YourPartitionFunction(YourPartitionColumn)  = P.partition_number
                       AND row_id IS NOT NULL
                       ORDER BY row_id
                    ) AS m
WHERE  P.object_id = OBJECT_ID('dbo.test_table')
       AND P.index_id <= 1; 

The more verbose alternative does mean that you will potentially miss out on the message:

Warning: Null value is eliminated by an aggregate or other SET operation.

but I doubt anyone cares about seeing that.

Upvotes: 6

Related Questions