Reputation: 1806
We have a legacy database which is a sql server db (2005, and 2008).
All of the primary keys in the tables are UniqueIdentifiers.
The tables currently have no clustered index created on them and we are running into performance issues on tables with only 750k records. This is the first database i've worked on with unique identifiers as the sole primary key and I've never seen sql server be this slow with returning data.
I don't want to create a clustered index on the uniqueidentifier as they are not sequential and will therefore slow the apps down when it comes to inserting data.
We cannot remove the uniqueidentifier as that is used for remote site record identity management purposes.
I had thought about adding a big integer identity column to the tables and creating the clustered index on this column and including the unique identifier column.
i.e.
int identity - First column to maintain insert speeds unique identifier - To ensure the application keeps working as expected.
The goal is to improve the identity query and joined table query performance.
Q1: Will this improve the query performance of the db or will it slow it down?
Q2: Is there an alternative to this that I haven't listed?
Thanks Pete
Edit: The performance issues are on retrieving data quickly through select statements, especially if a few of the more "transactional / changing" tables are joined together.
Edit 2: The joins between tables are generally all between the primary key and foreign keys, for tables that have foreign keys they are included in the non-clustered index to provide a more covering index.
The tables all have no other values which would provide a good clustered index.
I'm leaning more towards adding an additional identity column on each of the high load tables and then including the current Guid PK column within the clustered index to provide the best query performance.
Edit 3: I would estimate that 80% of the queries are performed on primary and foreign keys alone through the data access mechanism. Generally our data model has lazy loaded objects which perform the query when accessed, these queries use the objects id and the PK column. We have a large amount of user driven data exclusion / inclusion queries which use the foreign key columns as a filter based on the criteria of for type X exclude the following id's. The remaining 20% is where clauses on Enum (int) or date range columns, very few text based queries are performed in the system.
Where possible I have already added covering indexes to cover the heaviest queries, but as yet i'm still dissapointed by the performance. As bluefooted says the data is being stored as a heap.
Upvotes: 6
Views: 475
Reputation: 50251
It's not 100% clear to me: is your number 1 access pattern to query the tables by the GUID or by other columns? And when joining to other tables, what columns (and data types) are most often used?
I can't really give you any solid recommendations until I understand more about how these GUIDs are used. I realize you said they're primary keys, but that doesn't guarantee they are used as the primary conditions on queries or in joins.
UPDATE
Now that I know a little more, I have a crazy suggestion. Do cluster those tables on the GUIDs, but set the fill factor to 60%. This will ameliorate the page split problem and give you better performance querying on those puppies.
As for using Guid.NewGuid(), it seems that you can do sequentialGUIDs in C# after all. I found the following code here on SO:
[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);
public static Guid SequentialGuid()
{
const int RPC_S_OK = 0;
Guid g;
if (UuidCreateSequential(out g) != RPC_S_OK)
return Guid.NewGuid();
else
return g;
}
newsequentialID() is actually just a wrapper for UuidCreateSequential. I'm sure if you can't use this directly on the client you can figure out a way to make a quick round-trip to the server to get a new sequential id from there, perhaps even with a "dispenser" table and a stored procedure to do the job.
Upvotes: 1
Reputation: 1105
If you don't have a clustered index on the table, it is being stored as a heap rather than a b-tree. Heap data access is absolutely atrocious in SQL Server so you definitely need to add a clustered index.
I agree with your analysis that the GUID column is a poor choice for clustering, especially since you don't have the ability to use NEWSEQUENTIALID(). You could create a new artificial integer key if you like, but if there is another column or combination of columns that would make sense as a clustered index, that is fine as well.
Do you have a field that is used frequently for range scans? Which columns are used for joins? Is there a combination of columns that also uniquely identifies the row aside from the GUID? Posting a sample of the data model would help us to suggest a good candidate for clustering.
Upvotes: 4
Reputation: 135848
I'm not sure where your GUIDs come from, but if they're being generated during the insert, using the NEWSEQUENTIALID() in SQL Server instead of NEWID() will help you avoid fragmentation issues during insert.
Regarding the choice of a clustered index, as Kimberly L. Tripp states here: "the most important factors in choosing a clustered index are that it's unique, narrow and static (ever-increasing has other benefits to minimizing splits)." A GUID falls short on the narrow requirement when compared to an INT or even BIGINT.
Kimberly also has an excellent article on GUIDs as PRIMARY KEYs and/or the clustering key.
Upvotes: 2
Reputation: 22204
You don't indicate what your performance issues are. If the worst performing action is an INSERT, then maybe your solution is right. If it's something else, then I'd look at how the clustered index can help that.
You might look at existing indexes on the table and the queries that use them. You may be able to select an index that, while degrades INSERTs slightly, provides a greater benefit to the current performance-problem areas.
Upvotes: 0