Reputation: 3831
We have a very large database and have been using shards which we want to get away from. The shards work by everytime a table gets really big, we start a new table that has the same schema as the previous table and keep a number in another table that helps us find which table the data is in. This is a cumbersome manual process and means we have data spread out over N different tables all with the same schema.
The idea we are trying for is to eliminate this need for sharding by using indexes. Our data lookup queries do not use unique keys and many records are returned that have the same values across columns.
The following illustrates many of our lookup selects for a particular table, the fields with the * indicate that field may or may not be in the select.
where clause: scheduled_test, *script, *label, *error_message
group/order: messenger_id, timeslice, script, label, error_message, step_sequence, *adapter_type
My thought is that I would not want to create an index with all of these 11 fields. I instead picked 3 of the ones that seemed to be used more commonly including the one that is always in the where clause. I had read that it is advisable not to have too wide an index with too many fields. I also had heard that the optimizer will use the indexed fields first and that it is not uncommon to have non unique indexes even though MSDN states to the effect that unique indexes is the big advantage. It's just not how our data is designed. I realize SQL will add something to the index to make it unique, but that doesn't seem to matter for our purposes.
When I look at the execution plan in sql server management studio on a query that is similar to what we might run, it says "clustered index seek cost 100%", but it is using the clustered index that I created so I am hoping this is better than the default clustered index that is just the generated primary key (previously how the table was defined). I am hoping that what I have here is as good or better than our sharding method and will eliminate the need for the shards.
We do insert alot of data into the tables all at once, but these rows all have the same data values across many columns and I think they would even tend to get inserted at the end as well. These inserts don't share values with older data and if the index is just 3 columns hopefully that would not be a very big hit on the inserts.
Does what I am saying seem reasonable or what else should I look into or consider ? Thanks alot, I am not that familiar with these types of indexing issues but have been looking on various websites and experimenting.
Upvotes: 0
Views: 1666
Reputation: 9298
Generally, the narrower the clustered index the better as the clustering key of the clustered index will be added to all non-clustered indexes, making them less efficient.
SQL server will add a uniquifier to non-unique clustered indexes, making them (and all non-clustered indexes) even wider still.
If the space used by these indexes is not an issue for you, then you should consider whether the value of the clustered index key is ever increasing (or decreasing) as if it isn't, you will get page splits and fragmentation which will definitely hurt your inserts.
It's probably worth setting this up in a test system if you can to examine the impact different indexing strategies have on your normal queries.
Upvotes: 3