Reputation: 3346
(This post is soliciting personal experiences about storing XML; please share what you know. :-) )
I am working on a service application that communicates with an external service using XML. I am planning to use SQL Server 2008 to store the XML that is received and sent to the external service. I'm exploring my options for storing the XML in the database. The three options that I have identified are:
I'm looking for any advice, based on your personal experience, with storing and retrieving XML data in SQL Server.
Some additional background: I've used an 'xsd.exe' equivalent called XsdObjectgenerator to create .net classes based on the XML schemas. When the service receives the XML file, it is deserialized into an instance of .net class. This instance is used to perform the operations of the service. My original plan was to then use option #1 above to store the XML. If I needed to update or report on the data, I would simply deserialize the db record back into one of my .net classes.
Although this approach works and makes working with the xml very simple, I have concerns that as the volume of data increases, the performance of querying XML data type records will decrease. This is why I've explored options 2. & 3. above.
In addition to storing the XML, the XML will be queried for use in both reports and a separate web application. The db records will be queried, sorted, filtered, grouped, summaried and possibly updated by the end users.
Upvotes: 7
Views: 1245
Reputation: 40319
A few jobs back (SQL 2000), we were storing XML as TEXT data, and our databases became significantly bloated--not so much with the data as with the tags used to identify it. I did some testing, and pkzip (I did say it was several jobs ago) crunched all the data down to 3% of it's original size.
Advice #1: Identify how long you need to store the data, and if/when possible archive old data out.
Advice #2: If you are using SQL 2008, look into the data compression options for the XML columns.
(Might not be relevant if your XMLs are short, but ours were all in the kbs and 10kbs.)
Upvotes: 1
Reputation: 3346
While continuing to explore solutions, a collegue forwarded the following applicable links:
Some preliminary conclusions from these articles and other research:
I will be mocking up each solution with test data and performing some benchmarking. I'll post the results here once they are available.
Upvotes: 1
Reputation: 754628
I guess it depends on what you want to do with your XML in your database.
If you're mostly only storing it, and possibly retrieving it later as a whole and sending it out again, then I'd definitely use the XML datatype - no point in shredding it into bits and pieces.
If you however need to mostly work with the contents of the XML file, and possibly also manipulate and change that content, then it might be advisable to create tables with columns to match your XML content, and shred it when storing it, use it, and when you need to, reassemble it from the relational pieces using something like SELECT (columns) FROM dbo.Table FOR XML.....
There is an overhead involved in shredding and reassembling - so you need to ask yourself if that's worth doing. But there's also an overhead involved if you need to manipulate the XML column too much.
If you only need read-only access to a few attributes in your XML, I've come to appreciate the ability to wrap those into a UDF and surface it as a computed column in your table. That way, you can easily select something from your table, based on values that are stored somewhere inside your XML - quite handy! But do not overuse this approach - works fine for 2, 3 attributes - but if you need to access your XML over and over again (and most or all of it), then you might be better off shredding it into relational pieces to begin with.
Upvotes: 5