Reputation: 103707
I have recently noticed that we have a number of tables stored in heaps (no clustered index). Would you create clustered indexes on them selectively, across the board, or not at all? Any other wisdom or advice?
There are some "codes" tables with 25 or so rows. However, there are several with well over a million rows.
EDIT of the "big tables", all of them already have indexes, just not clustered ones. a few are log tables, where they are just inserting, with little reading. There are a few that are quite important and are mostly just inserted into and then read a bunch of times by the application.
EDIT there are PK on all tables, with the few I'm interested in, they are mainly just inserted one time but read many times to display screens.
On some of these tables they are inserted in a a block or related rows at one time and read many times with no updates or the group is completely deleted and then reinserted as a block again. They are usually read in the some block to display or make calculations from.
On another "type" of these tables, the rows are repeatedly inserted in groups of related rows, with different groups inserting all the time. on screen display, the complete group will need to be returned. for example, over time these groups of rows are inserted (where a group could be 5-50 rows):
1:00pm A1, B1, C1,
1:30pm A2, B2, C2,
2:00pm A3, B3, C3, D1
2:30pm A4, C4, D2
3:00pm C5, D3, E1
3:30pm D4, E2
screen would need to display complete set of A: A1+A2+A3+A4
EDIT Based on @gbn answer mentioning about fragmentation, I used this query from marc_s and found the following fragmentation info for the heap tables with million+ rows and that are read many times and used by screens:
TableName index_type alloc_unit_type index_depth index_level avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages page_count avg_page_space_used_in_percent record_count ghost_record_count Version_ghost_record_count min_record_size_in_bytes max_record_size_in_bytes avg_record_size_in_bytes forwarded_record_count
--------- ---------- --------------- ----------- ----------- ---------------------------- -------------- -------------------------- ---------- ------------------------------ ------------ ------------------ -------------------------- ------------------------ ------------------------ ------------------------ ----------------------
TABLE_A HEAP IN_ROW_DATA 1 0 95.8294717330862 2069 8.18511358144031 16935 98.2659995058068 1125786 3 0 80 164 117.671 0
TABLE_A HEAP IN_ROW_DATA 1 0 95.8294717330862 2069 8.18511358144031 16935 98.2659995058068 1125786 3 0 80 164 117.671 0
TABLE_A HEAP IN_ROW_DATA 1 0 95.8314034275127 2070 8.18212560386473 16937 98.2559303187546 1125793 11 0 80 164 117.672 0
TABLE_B HEAP IN_ROW_DATA 1 0 99.2541594951233 1734 6.44982698961938 11184 94.5866567828021 1222729 0 0 68 82 68.037 0
TABLE_B HEAP IN_ROW_DATA 1 0 99.2541594951233 1734 6.44982698961938 11184 94.5866567828021 1222729 0 0 68 82 68.037 0
TABLE_B HEAP IN_ROW_DATA 1 0 99.197247706422 1735 6.44726224783862 11186 94.5725228564369 1222745 23 0 68 82 68.038 0
TABLE_C HEAP IN_ROW_DATA 1 0 71.5785224061365 1777 10.9527293190771 19463 97.4122807017544 2237831 0 0 9 84 66.588 2485
TABLE_C HEAP IN_ROW_DATA 1 0 71.5785224061365 1777 10.9527293190771 19463 97.4122807017544 2237831 0 0 9 84 66.588 2485
TABLE_C HEAP IN_ROW_DATA 1 0 71.589991928975 1778 10.9476940382452 19465 97.4023844823326 2237832 0 0 9 84 66.588 2485
TABLE_D HEAP IN_ROW_DATA 1 0 40.0769404842725 1773 19.7535250987028 35023 98.0193106004448 2778169 0 0 98 112 98.041 0
TABLE_D HEAP IN_ROW_DATA 1 0 40.0904977375566 1774 19.7480270574972 35033 98.0175315048184 2778821 0 0 98 112 98.044 0
TABLE_D HEAP IN_ROW_DATA 1 0 40.1040488577245 1775 19.7385915492958 35036 98.0142451198419 2778948 0 0 98 112 98.045 0
TABLE_E HEAP IN_ROW_DATA 1 0 97.1619365609349 2911 8.11473720371007 23622 99.390066716086 3333693 0 0 55 69 55.017 0
TABLE_E HEAP IN_ROW_DATA 1 0 97.1628838451268 2912 8.11332417582418 23626 99.3852359772671 3334016 0 0 55 69 55.018 0
TABLE_E HEAP IN_ROW_DATA 1 0 97.1638304971638 2913 8.11122554067971 23628 99.3799357548802 3334100 0 0 55 69 55.018 0
TABLE_F HEAP IN_ROW_DATA 1 0 21.9911471599199 8903 36.3093339323823 323262 94.6116753150482 4734053 44 0 521 535 521.046 0
TABLE_F HEAP IN_ROW_DATA 1 0 21.9911471599199 8903 36.3093339323823 323262 94.6116876698789 4734053 50 0 521 535 521.046 0
TABLE_F HEAP IN_ROW_DATA 1 0 21.9930761622156 8904 36.3057053009883 323266 94.6112428959723 4734079 78 0 521 535 521.047 0
TABLE_G HEAP IN_ROW_DATA 1 0 66.1932151660993 5649 11.9943352805806 67756 96.7873733629849 6632610 0 0 78 92 78.047 0
TABLE_G HEAP IN_ROW_DATA 1 0 66.1932151660993 5649 11.9943352805806 67756 96.7873733629849 6632610 0 0 78 92 78.047 0
TABLE_G HEAP IN_ROW_DATA 1 0 66.1971830985916 5650 11.9925663716814 67758 96.7855572028663 6632648 11 0 78 92 78.048 0
TABLE_H HEAP IN_ROW_DATA 1 0 11.5377268385864 5585 67.4340196956132 376619 92.3860637509266 6897347 0 0 9 427 406.418 3
TABLE_H HEAP IN_ROW_DATA 1 0 11.5449915110357 5576 67.5530846484935 376676 92.3849023968372 6898289 0 0 9 427 406.419 3
TABLE_H HEAP IN_ROW_DATA 1 0 11.5487458087518 5578 67.5313732520617 376690 92.3848035581913 6898534 0 0 9 427 406.42 3
TABLE_I HEAP IN_ROW_DATA 1 0 96.7330677290837 9715 8.23201235203294 79974 96.3321225599209 3152049 0 0 76 534 195.879 0
TABLE_I HEAP IN_ROW_DATA 1 0 96.7333930883378 9716 8.23157678056814 79978 96.3298122065728 3152142 0 0 76 534 195.879 0
TABLE_I HEAP IN_ROW_DATA 1 0 96.7337183827923 9717 8.23114129875476 79982 96.3323696565357 3152420 0 0 76 534 195.876 0
TABLE_J HEAP LOB_DATA 1 0 0 NULL NULL 87553 95.5205090190264 7790594 0 0 84 98 84.91 NULL
TABLE_J HEAP IN_ROW_DATA 1 0 31.2985438510012 23539 25.4966651089681 600166 96.4532863849765 7807684 0 0 435 1213 598.261 0
TABLE_J HEAP IN_ROW_DATA 1 0 31.2994591137993 23540 25.4959218351742 600174 96.4530145787003 7807780 0 0 435 1213 598.26 0
TABLE_J HEAP IN_ROW_DATA 1 0 31.3022047558782 23543 25.4936074417024 600196 96.4526068692859 7808096 0 0 435 1213 598.255 0
I'm not sure why there are multiple rows for each table, but the avg_fragmentation_in_percent
values look fairly high for almost all of these tables. Would that fragmentation be a performance issue when reading? would a clustered index be advised to defragment them?
Upvotes: 2
Views: 4749
Reputation: 5344
One problem with a clustered index on large tables is that the buffer memory (RAM) required to store the index is equal to the size of the table. There is no separate index. A non clustered index stores only the data of the index and then the table's primary key or a refid. So a normal index can much more likely fit in RAM. If you are doing searches using the clustered index and the table is large then you could easily be slowing things down. If your clustered index is part of the date and your searches are all for recent dates, then maybe the clustered index won't hurt searching performance since you never access all of it.
I disagree with posters claiming the clustered index will reduce data fragmentation. It increases the data fragmentation. On a normal table, only deleting causes fragmentation. As you add rows to a clustered table, or change the a field of the clustered index, SQL has to physically reorder table. This means breaking and adding data pages which increases fragmentation. Thats why everyone recommends being careful to pick a field for clustering that a) doesnt change often if ever, and b) always increments.
I find that a clustered index is useful on large tables when your queries need to return multiple "related" rows often. You can use the cluster so the related rows are stored consecutively and easier for SQL to retrieve. I wouldn't necessarily cluster a large table so that I would have a new index for searching.
The one advantage that clustering does have, like a covering index, is that the index contains the data that query is trying to return. There is no extra step from the index to the table to get the data.
In the end, you have to get the profiler out, and run some tests.
Am I getting this right or missing something?
Upvotes: -1
Reputation: 432672
Add a clustered index always. Without a clustered index, you can not quickly compact or defrag the table. Without it, you can't.
Simplistic, but I bet some of the performance issues could be traced to badly organised data.
Upvotes: 3
Reputation: 15685
for large table a Clustered index is always a good idea. even for insert only table. of course your clustering key should be an ever increasing value.
Upvotes: 2
Reputation: 56984
I would consider to put a clustered index on the large tables. The clustered index defines the physical order of how the records are stored. The consequence of this, is that the rows in the table can be stored more efficiently, and reduce fragmentation. I'm sure there will at least be one column in the table which could be a candidate to put a clustered index on. (And if not, you could create a new column, which contains the date and time when the record has been created, and you put a clustered index on that column. I think that this is still better then no CI at all).
Edit: if the large tables are indeed log-tables, that aren't read frequently, then they can be left as a heap.
Upvotes: 0
Reputation: 416149
It depends on how the tables are used. Normally I want a clustered index on a table with millions of records, but you also need to consider how the table is used. Adding an index will slow down inserts because it has to lookup the proper page for each new record (and possibly insert a new page) rather than just append it. If these title are primarily "dumps" for data and are rarely checked (like emergency logging, for example), then leaving them alone might be better.
As always, you should profile to find out what works best for your application or system.
Upvotes: 0
Reputation: 238296
Sounds like the database was created by someone who knew what he was doing. Log tables and small code tables are exactly where heaps make sense.
If there are no current problems with the database, I'd leave it as it is!
Upvotes: 5