SQL Server 2008 - Manually parse XML, or use built-in XML Indexes for XML field?

Question

I'm working on a logging database in SQL Server 2008. It'll consist mainly of one table something like this:

StepLog 
----------------
  StepLogID
  ClientID
  LogContent   XML
  CreateDate

Basically what will happen in this table is that various clients will log certain activities to this table. The LogContent field will be XML - untyped because we don't know what clients want to log.

To allow the LogContent field to be searched, the current plan is to shred out the LogContent field programmatically. The metadata for shredding would be in a table something like the following:

XPathAttribute
----------------
  XPathAttributeID
  AttributeName
  AttributeDescription
  XPath

Upon insert of a record into StepLog, we would have a stored procedure that would take all the Xpaths defined in XPathAttribute, and write them out to another table, XPathAttributeValue

XPathAttributeValue
----------------
  XPathAttributeValueID
  StepLogID
  AttributeID
  AttributeValue

My original idea, when looking at this design, was "why not just use the XML indexes, both primary and secondary? That would avoid lots of work on our side, and use built-in functionality.

I don't have a lot of experience with XML indexes, and the original designer had some poor experiences with XML indexes (poor performance) in SQL Server2005, that's how this design originated.

Feedback would be very much appreciated!

thanks, Sylvia

Remus Rusanu · Accepted Answer

XML indexes help in particular scenarios, as described in Secondary XML Indexes:

Following are some guidelines for creating one or more secondary indexes:

If your workload uses path expressions significantly on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of the exist() method on XML columns in the WHERE clause of Transact-SQL.

If your workload retrieves multiple values from individual XML instances by using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its primary key value is known.

If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as //author[last-name="Howard"], where elements can occur at any level of the hierarchy. It also occurs in wildcard queries, such as /book [@* = "novel"], where the query looks for elements that have some attribute having the value "novel".

As you can see, each type of index is appropriate for a particular scenario. With an open ended approach like your project, is hard to tell which indexes are going to be helpful and which not.

Another thing to consider is that XML performs much better if you can declare an XML schema for the column, but the nature of your project does not allow this.

So overall I'd say... measure and see. Shredding the XML and storing the values in relational tables is very likely to boost performance over raw XML access. But that would apply if you know the schema and shred out a specific set of information, that you then index properly. Right now, even though you shred out some information, you shred it out into what basically is an EAV structure, which is difficult both to query and to optimize. I also recommend you read up on Best Practices for Semantic Data Modeling for Performance and Scalability for some discussions around the EAV shortcomings and how to avoid some problems.

SQL Server 2008 - Manually parse XML, or use built-in XML Indexes for XML field?

Answers (2)

Related Questions