Chris Schaller
Chris Schaller

Reputation: 16767

In a SQL table with many rows, how can I quickly determine if a query might return more than 1000 rows

NOTE: This is a re-posting of a question from a Stack Overflow Teams site to attract a wider audience

I have a transaction log table that has many millions of records. Many of the data items that are linked to these logs might have more than 100K rows for each item.

I have a requirement to display a warning if a user tries to delete an item when more than 1000 items in the log table exist.

We have determined that 1000 logs means this item is in use

If I try to simply query the table to lookup the total number of log rows the query takes too long to execute:

SELECT COUNT(1) 
FROM History
WHERE SensorID IN (SELECT Id FROM Sensor WHERE DeviceId = 96)

Count from History table takes too long

Is there a faster way to determine if the entity has more than 1000 log records?

NOTE: history table has an index on the SensorId column.

Upvotes: 0

Views: 1939

Answers (1)

Chris Schaller
Chris Schaller

Reputation: 16767

You are right to use Count instead of returning all the rows and checking the record count, but we are still asking the database engine to seek across all rows.

If the requirement is not to return the maximum number of rows, but just to determine if there are more than X number of rows, then the first improvement I would do is to return the count of just the first X rows from the table.

So if X is 1000, your application logic does not need to change, you will still be able to determine the difference between an item with 999 logs and 1000+ logs

We simply change the existing query an select the TOP(X) rows instead of the count, and then return the count of that resultset, only select the primary key or a unique indexed column so that we are only inspecting the index and not the underlying table store.

select count(Id) FROM (
  SELECT TOP(1000)    // limit the seek that the DB engine does to the limit
         Id           // Further constrain the seek to just the indexed column 
  FROM History 
  where SensorId IN ( // this is the same filter condition as before, just re-formatted
    SELECT Id 
    FROM Sensor 
    WHERE DeviceId = 96)
) as trunk

Changing this query to top 10,000 still provides sub-second response, however with X = 100,000 the query took almost as long as the original query


There is another seemingly 'silver bullet' approach to this type of issue if table in question has a high transaction rate and the main reason for the execution time is due to waiting cause by lock contention.

If you suspect that locks are the issue, and you can accept a count response that includes uncommitted rows then you can use the WITH(NOLOCK) table hint to allow the query to run effectively in the READ UNCOMMITED transaction isolation level.

There is a good discussion about the effect of the NOLOCK table hint on select queries here

SELECT COUNT(1) FROM History WITH (NOLOCK) 
WHERE SensorId IN (SELECT Id FROM Sensor WHERE DeviceId = 96)

Although strongly discouraged, this is a good example of a scenario when NOLOCK can easily be permitted, it even makes sense, as your count before delete will take into account another user or operation that is actively adding to the log count.

After many trials, when querying for 1000 or 10K rows the select with count solution is still faster than using the NOLOCK table hint. NOLOCK however presents an opportunity to execute the same query with minimal change, while still returning within a timely manner.

The performance of a select with NOLOCK will still increase as the number of rows in the underlying result set increases, where as the performance of the select that has a top with no order by clause should remain constant once the top limit has been exceeded.

Upvotes: 3

Related Questions