Reputation: 65238
Hi I have a need to store hundreds if not thousands of elements in the database as XML. I will not index anything in the XML field. I will simply select certain elements within the xml. I would like to know if there is any performance penalty for simply selecting fields in the XML. Here is example XML that will be stored in the database.
<fields>
<field name="FirstName" type="text" value="Gary" sort="2" />
<field name="LastName" type="text" value="Smith" sort="3" />
<field name="City" type="text" value="Los Angeles" sort="4" />
<field name="Age" type="number" value="12" sort="6" />
<field name="Address" type="text" sort="2">
<streetnumber value="1234" />
<streetname value="sail" />
</field>
</fields>
I will probably have more than 3000 field tags in one record. I simply want to get 10 fields in a single query. I will have a primary key on the table and will be selecting records based on the primary key but will be getting fields from the XML column. I am afraid the more field elements I put in the XML will compromise performance. Will there be a performance penalty for simply selecting 10 or more fields from the XML column? Also, I will not be using the xml column in a where clause I will use the primary in the where clause then I will select fields from the XML column. Will there be a performance penalty?
Upvotes: 2
Views: 6904
Reputation: 1701
I realize that this is not a direct answer to the OP's question (although, it's related to it), but I think this could really help many people that have been redirected here looking for some idea on how to deal with the poor performance of the XML data type on SQL Server. After many years struggling with this issue, I finally found a solution that, for some reason, is not that easy to come by:
SXI - Selective XML Indexes (starting with SQL 2008)
MS Docs link: https://learn.microsoft.com/en-us/sql/relational-databases/xml/selective-xml-indexes-sxi?view=sql-server-2017
On my local tests with tables containing 3MM+ records, it worked amazingly well!
Upvotes: 0
Reputation: 650
[BEGIN EDIT]
jbl's direct answers to your questions, and Terror.Blade's answer re' XML being better than NVARCHAR(MAX) both make sense (I upvoted them :).
My experience was without storing an XML schema in SQL Server (Terror.Blade's tip), and without indexing (jbl gave the most, re' that)... but I'm leaving my answer, because I think my links could be very helpful... and it's still an example of worst case ;)
[END EDIT]
From experience, I'll say that the loading of an XML data type is quick, but as for using it -- I found that to be slow, but the personal example coming to mind involved updating, and using xQuery, and those may have been factors in my slowdown. In that example, it took 1hr55mins to process only 127,861 rows. (Terror.Blade's tip, of storing an XML schema in SQL Server, and jbl's link & share re' XML indexing both sound pretty slick ;) and might address that slowdown.)
RELATED:
Here's some tips re' optimizing XML in SQL... though some of them only apply if you have control over the format of the XML:
http://msdn.microsoft.com/en-us/library/ms345118.aspx
If you're using xQuery, check out these docs:
http://download.microsoft.com/download/0/F/B/0FBFAA46-2BFD-478F-8E56-7BF3C672DF9D/XQuery%20Language%20Reference.pdf
((And if you're using SQLXMLBulkLoad at all, consider using "overflow-field"s, to capture whatever is not defined in your schema. There's some useful tips in this tangentially related TechNote:
http://social.technet.microsoft.com/Forums/sqlserver/en-US/393cf604-bf6e-488b-a1ea-2e984aa14500/how-do-i-redirect-xml-comments-that-sqlxmlbulkload-is-storing-in-the-overflowfield?forum=sqlxml ))
HTH.
Upvotes: 1
Reputation: 15413
Based on my experience on XML in SQL Server Xml datatype, and on Indexes on XML Data Type Columns (the whole section deserves thorough reading)
Will there be a performance penalty for simply selecting 10 or more fields from the XML column ?
Yes, because your XML document is stored as a blob. Without a primary XML index, this blob will need to be exploded for query processing (filtering and projection) As to XML, indexes can be seen as a relational representation of your document (pre-exploding the blob)
Without an index, these binary large objects are shredded at run time to evaluate a query. This shredding can be time-consuming
As to your second question
Also, I will not be using the xml column in a where clause I will use the primary in the where clause then I will select fields from the XML column. Will there be a performance penalty?
If you are going to project among 3000 field tags, you might benefit from a secondary XML index, though I'm not sure which one. PROPERTY secondary index seems fit for projection, but it seems to apply on value
calls (the french documentation seems to imply more than just value
calls but that may be some translation mistake)
For my part, I ended-up setting the three kind of secondary indexes on my XML column (1 million documents on 30 different schemas, 50-100 elements each) But my app requires a lot more filtering than projection.
Upvotes: 3