jeremcc
jeremcc

Reputation: 8741

Should I index a bit field in SQL Server?

I remember reading at one point that indexing a field with low cardinality (a low number of distinct values) is not really worth doing. I admit I don't know enough about how indexes work to understand why that is.

So what if I have a table with 100 million rows in it, and I am selecting records where a bit field is 1? And let's say that at any point in time, there are only a handful of records where the bit field is 1 (as opposed to 0). Is it worth indexing that bit field or not? Why?

Of course I can just test it and check the execution plan, and I will do that, but I'm also curious about the theory behind it. When does cardinality matter and when does it not?

Upvotes: 125

Views: 46860

Answers (18)

Chetan Verma
Chetan Verma

Reputation: 873

You need to be smart here to query, you must know the load value on your column if the load of true is more in your system and you want to check all the true values writ your query to check not false.. it will help alot, it just trick.

Upvotes: 0

Philippe Boucher
Philippe Boucher

Reputation: 111

If your distribution is pretty known and unbalanced, like 99% of the rows are bit = 1 and the 1% are bit = 0, when you do a WHERE clause with bit = 1, a full table scan will be around the same time as the index scan. If you want to have a fast query where bit = 0, the best way I know is create a filtered index, adding a clause WHERE bit = 0. That way, that index will only store the 1% row. Then doing a WHERE bit = 0 will simply let the query optimizer choose that index, and all rows from it will be bit = 0. You also have the benefit to have a very small amount of disk space required compare a full index on the bit.

Upvotes: 11

Ben Thul
Ben Thul

Reputation: 32717

I just came across this question by way of another. Assuming that your statement that only a handful of the records assume the value of 1 (and that those are the ones you're interested in), then a filtered index could be a good choice. Something like:

create index [IX_foobar] on dbo.Foobar (FooID) where yourBitColumn = 1

This will create a substantially smaller index that the optimizer is smart enough to use when that is a predicate in your query.

Upvotes: 26

gbn
gbn

Reputation: 432421

very late answer...

Yes, it can be useful according to SQL CAT team (updated, has been consolidated)

Upvotes: 1

Geoff Cox
Geoff Cox

Reputation: 6152

Consider what an index is in SQL - and index is really a chunk of memory pointing at other chunks of memory (i.e. pointers to rows). The index is broken into pages so that portions of the index can be loaded and unloaded from memory depending on usage.

When you ask for a set of rows, SQL uses the index to find the rows more quickly than table scanning (looking at every row).

SQL has clustered and non-clustered indexes. My understanding of clustered indexes is that they group similar index values into the same page. This way when you ask for all the rows matching an index value, SQL can return those rows from a clustered page of memory. This is why trying to cluster index a GUID column is a bad idea - you don't try to cluster random values.

When you index an integer column, SQL's index contains a set of rows for each index value. If you have a range of 1 to 10, then you would have 10 index pointers. Depending on how many rows there are this can be paged differently. If your query looks for the index matching "1" and then where Name contains "Fred" (assuming the Name column is not indexed), SQL gets the set of rows matching "1" very quickly, then table scans to find the rest.

So what SQL is really doing is trying to reduce the working set (number of rows) it has to iterate over.

When you index a bit field (or some narrow range), you only reduce the working set by the number of rows matching that value. If you have a small number of rows matching it would reduce your working set a lot. For a large number of rows with 50/50 distribution, it might buy you very little performance gain vs. keeping the index up to date.

The reason everyone says to test is because SQL contains a very clever and complex optimizer that may ignore an index if it decides table scanning is faster, or may use a sort, or may organize memory pages however it darn well likes.

Upvotes: 85

Ian Boyd
Ian Boyd

Reputation: 256891

You can't index a bit field in SQL Server 2000, as was indicated in the Books Online at the time:

bit

Integer data type 1, 0, or NULL.

Remarks

Columns of type bit cannot have indexes on them.

Yes, if you have only a handful of rows, out of millions, an index will help. But if you want to do it in this case you need to make the column a tinyint.

Note: Enterprise Manager will not let you create an index on a bit column. If you wish you can still manually create an index on a bit column:

CREATE INDEX IX_Users_IsActiveUsername ON Users
(
   IsActive,
   Username
)

But SQL Server 2000 will not actually use such an index - running a query where the index would be a perfect candidate, e.g.:

SELECT TOP 1 Username 
FROM Users
WHERE IsActive = 0

SQL Server 2000 will do a table scan instead, acting as though the index doesn't even exist. If you change the column to a tinyint SQL Server 2000 will do an index seek. Also, the following non-covered query:

SELECT TOP 1 * 
FROM Users
WHERE IsActive = 0

It will perform an index seek, followed by a bookmark lookup.


SQL Server 2005 does have limited support for indexes on bit columns. For example:

SELECT TOP 1 Username 
FROM Users
WHERE IsActive = 0

will cause an index seek through the covering index. But the non-covered case:

SELECT TOP 1 * 
FROM Users
WHERE IsActive = 0

will not cause an index seek followed by a bookmark lookup, it will perform a table scan (or clustered index scan), rather than performing the index seek followed by a bookmark lookup.

Verified by experimentation and direct observation.

Upvotes: 1

John B
John B

Reputation: 1179

Ian Boyd is correct when he says that you could not do it via Enterprise Manager for SQL 2000 (see his note regarding creating it throught T-SQL.

Upvotes: 0

Steven A. Lowe
Steven A. Lowe

Reputation: 61242

measure response time before and after and see if it is worthwhile; theoretically it should improve performance for queries using the indexed fields but it really depends on the distribution of true/false values and the other fields involved in the queries that you're concerned about

Upvotes: 0

thijs
thijs

Reputation: 3465

If you want to know if an index has the effects you desire: test and test again.

In general you don't want an index that doesn't narrow down your table enough, because of the cost to maintain an index. (cost > profit). But if the index in your case will cut the table in half, you may gain something but putting it on the table. It all depends on the exact size/structure of your table and how you are using it (number of reads/writes).

Upvotes: 2

Jeremy
Jeremy

Reputation:

If your goal is to make querying for records where the bit field value equals '1' faster you might try an indexed view of your base table which only contains records where your bit field equals '1'. In enterprise edition if a query could make use of an indexed view instead of a specified table to improve query performance it will use the view. In theory this would increase the speed of select queries which only look for records with a bit field value of '1'.

http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx

All this assumes you are Microsoft SQL Server 2005 Enterprise. The same might apply to 2008, I'm not familiar with that version.

Upvotes: 2

C. Dragon 76
C. Dragon 76

Reputation: 10062

100-million records with only a few having the bit field set to 1? Yes, I would think indexing the bit field would definitely speed up querying the bit=1 records. You should get logarithmic search time from the index and then only touch the few pages with bit=1 records. Otherwise, you'd have to touch all pages of the 100-million record table.

Then again, I'm definitely not a database expert and could be missing something important.

Upvotes: 11

Craig Nicholson
Craig Nicholson

Reputation: 1241

On its own, no as it results in very little selectivity. As part of a compound index. quite possibly but only after other equality columns.

Upvotes: 1

Anthony Potts
Anthony Potts

Reputation: 9160

Cardinality is one factor, the other is how well does the index divide your data. If you have about half 1s and half 0s, then it will help. (Assuming that that index is a better path to choose than some other index). However, how often are you inserting and updating? Adding indexes for SELECT performance also hurt the INSERT, UPDATE and DELETE performance, so keep that in mind.

I would say, if the 1s to 0s (or vice versa) isn't better than 75% to 25%, don't bother.

Upvotes: 0

DJ.
DJ.

Reputation: 16257

"I remember reading at one point that indexing a field with low cardinality (a low number of distinct values) is not really worth doing"

That because SQL Server will almost always find its more efficient to just do a table-scan than to read the index. So basically your index will never get used and it's a waste to maintain it. As others have said it might be ok in a compound index.

Upvotes: 2

BradC
BradC

Reputation: 39986

While I don't think I would index JUST a bit column by itself, it is very common to include bit columns as part of a compound index.

A simple example would be an index on ACTIVE, LASTNAME instead of just lastname, when your application is almost always looking for active customers.

Upvotes: 8

anon
anon

Reputation:

As others have said, you'll want to measure this. I don't recall where I've read this, but a column needs to have very high cardinality (around 95%) in order for an index to be effective. Your best test for this would be to build the index and examine the execution plans for the 0 and 1 values of the BIT field. If you see an index seek operation in the execution plan then you know that your index will be used.

Your best course of action would be to test the with a basic SELECT * FROM table WHERE BitField = 1; query and slowly build out the functionality from there step by step until you have a realistic query for your application, examining the execution plan with every step to make sure that the index seek is still being utilized. Admittedly, there is no guarantee that this execution plan will be used in production, but there is a good chance that it will be.

Some information can be found on the sql-server-performance.com forums and in the referenced article

Upvotes: 2

Bogdan Maxim
Bogdan Maxim

Reputation: 5946

Of course it worths, especially if you need to retrieve the data by that value. It would be similar to using a sparse matrix instead of using a normal matrix.

Now with SQL 2008 you can use partitioning functions, and you are able to filter the data that goes in an index. The disadvantage for earlier versions would be that the index would be made for all the data, but this can be optimized by storing the interesting values in a separate file group.

Upvotes: 2

jason saldo
jason saldo

Reputation: 9960

Is this a common query? It may be worth it when looking for the "handful" of records but won't help you much on the other rows. Are there other ways to identify the data?

Upvotes: 0

Related Questions