kodbuse
kodbuse

Reputation: 1010

Clustered GUID column and newsequentialid across servers

It is well known that using random values in a column with a clustered index isn't a good idea, which is why using GUIDs for a primary key with a clustered index is usually not recommended. Using the newsequentialid() function, we can overcome most of these difficulties.

However, what happens if you are generating your GUIDs on a farm of web servers, all hitting the same database? I am creating sequential IDs in .NET code using UuidCreateSequential as described in this article: http://blogs.msdn.com/b/dbrowne/archive/2012/07/03/how-to-generate-sequential-guids-for-sql-server-in-net.aspx

The problem is that while the resulting GUIDs are sequential from a single machine, the same is not true across multiple machines. Because the most significant 11 bytes (according to SQL Server) seem to stay pretty much the same for the same machine, it effectively sorts by machine and then time, rather than the desired opposite.

Would it be worthwhile and doable to reorder the bytes in the GUID to get near-sequential GUIDs between machines, or should I give up and make the indexes non-clustered?

Thanks!

Upvotes: 0

Views: 489

Answers (2)

Alexandre Palma
Alexandre Palma

Reputation: 1

you can also generate your id's on c#, take a look into this post on code project the prb is that the code generated by this implementation doesn’t match what NEWSEQUENTIALID generates, since that my goal was that the c# code would generate the last 6 bytes of the Guid as the NewSequentialID function of Sql server, I end up with the following code.

public static Guid ToSequentialAtEnd(this Guid guid)
{
    byte[] guidArray = guid.ToByteArray();

    DateTime now = DateTime.UtcNow;
    var baseDate = new DateTime(1900, 1, 1);

    // Get the days and milliseconds which will be used to build the byte string 
    var days = new TimeSpan(now.Ticks - baseDate.Ticks);
    TimeSpan msecs = now.TimeOfDay;

    // Convert to a byte array 
    // Note that SQL Server is accurate to 1/300th of a millisecond so we divide by 3.33333333 
    byte[] daysArray = BitConverter.GetBytes(days.Days);
    byte[] msecsArray = BitConverter.GetBytes((long)(msecs.TotalMilliseconds / 3.33333333));

    // Reverse the bytes to match SQL Servers ordering 
    Array.Reverse(daysArray);
    Array.Reverse(msecsArray);

    // Copy the bytes into the guid 
    Array.Copy(daysArray, daysArray.Length - 2, guidArray, guidArray.Length - 6, 2);
    Array.Copy(msecsArray, msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4);

    return new Guid(guidArray);
}

Upvotes: 0

kodbuse
kodbuse

Reputation: 1010

After trying this, I'm going to answer my own question and say that generating sequential GUIDs (COMB GUIDs) from multiple machines as described in the question is a non-issue. Essentially you will have one separate sequence of IDs per machine, which will not result in page splits, since they will be appended to the end of different pages, and not in the middle of a page (as a new ID will always be the largest in its sequence).

While a GUID may not be as efficient as an int, I have not had any problems using this approach with millions of rows per table.

Upvotes: 1

Related Questions