Reputation: 481
In a database that I inherited to support there are several tables with an NVARCHAR(MAX)
column containing a blob of untyped XML data. I have been tasked with searching these XML blobs for references to a given ID, then stop when the first reference is found and report what was found. I am currently looping through each row, converting the blob to XML then using XQuery to search the blob for the ID.
However, the initial conversion of the NVARCHAR value to XML is taking way too long due to the large number of rows in the tables. Below is an example of the code I'm using to convert each blob, with HistoryID
being the PK and the value changing to grab the next XML blob from XMLData
:
SELECT @XML = CONVERT(XML, XMLData)
FROM dbo.History
WHERE HistoryID = 1;
I have also tried to leverage C# and LINQ, but the XML blobs are too large to pass back and forth from .NET to SQL Server. Unfortunately I also cannot modify the existing tables without breaking the application. Is there a better way to get these XML blobs into a searchable form or am I just SOL?
Upvotes: 1
Views: 2548
Reputation: 2115
SQL has ways of acting on XML in the database such as value, exist, nodes, query, and modify. The value property of a XML element is then queryable. More info: https://www.simple-talk.com/sql/learn-sql-server/the-xml-methods-in-sql-server/
Something like:
SELECT
CONVERT(XML, XMLData).value('(/Updated/UserId)[1]', 'int') AS UpdatedId
FROM History
WHERE CONVERT(XML, XMLData).exist('(/Updated/UserId)[1]') > 0
Would have to try it to see what happens if some of the xmldata blobs are as large as 360mb...
For performance, an XML index (if modifying the schema was possible) can be added to the table https://msdn.microsoft.com/en-us/library/ms191497.aspx.
Upvotes: 1
Reputation: 67311
No need for a loop... Try it like this
CREATE TABLE #tbl (ID INT,SomeColumn VARCHAR(100),YourXMLasString NVARCHAR(MAX));
INSERT INTO #tbl VALUES
(1,'Test 11','<root><id>100</id><more>abc</more></root>')
,(2,'Test 12','<root><id>100</id><more>abc</more></root>')
,(3,'Test 21','<root><id>200</id><more>def</more></root>')
,(4,'Test 22','<root><id>200</id><more>def</more></root>')
,(5,'Test 23','<root><id>200</id><more>def</more></root>')
,(6,'Test 3','<root><id>300</id><more>ghi</more></root>');
--Just take the id 200
SELECT *
FROM #tbl
CROSS APPLY(SELECT CAST(YourXMLasString AS XML)) AS A(RealXml)
WHERE RealXml.exist('/root[id=200]') = 1
--Now stop at the first with 200
SELECT TOP 1 *
FROM #tbl
CROSS APPLY(SELECT CAST(YourXMLasString AS XML)) AS A(RealXml)
WHERE RealXml.exist('/root[id=200]') = 1
ORDER BY ID
GO
DROP TABLE #tbl;
Whenever you search for an ID you do the conversion over and over ... It would be much faster of course, if you store the converted XML.
If you are searching for the same values more often, it might be a good idea to read these values from your XML and store it in dedicated indexed columns...
With this you could add a column to the existing table
ALTER TABLE #tbl ADD RealXml XML;
GO
UPDATE #tbl SET RealXml=CAST(YourXMLasString AS XML);
SELECT * FROM #tbl;
And with this you could create a side table
CREATE TABLE #SideTable(RefID INT,RealXML XML);
INSERT INTO #SideTable
SELECT ID,CAST(YourXMLasString AS XML)
FROM #tbl;
SELECT * FROM #SideTable;
If the data is changing, you might think about a trigger to do the conversion and update of the side column/table immediately.
Upvotes: 0
Reputation: 127
// xmlElement is the column read from DB , consider having an index on the field as well as batching the reads if its too big
public static List<History> DeserializeHistory(XElement xmlElement)
{
System.Xml.Serialization.XmlSerializer serializer = new System.Xml.Serialization.XmlSerializer(typeof(List<History>));
using (System.Xml.XmlReader reader = xmlElement.CreateReader())
{
return (List<History>)serializer.Deserialize(reader);
}
}
Upvotes: 0