Reputation: 313
I have data stored in XML, which is structured in more of an HTML format than real XML. The element head contains column names and row elements contain data for these columns.
<root>
<head>
<value>DeviceID</value>
<value>VolumeName</value>
<value>SizeGB</value>
<value>FreeSpaceGB</value>
<value>FreeSpacePercent</value>
</head>
<row>
<value>C:</value>
<value>c$OS-COMPUTER01</value>
<value>126</value>
<value>43</value>
<value>34</value>
</row>
<row>
<value>D:</value>
<value>D$DB-COMPUTER01</value>
<value>127</value>
<value>104</value>
<value>82</value>
</row>
<row>
<value>E:</value>
<value>E$COMPUTER01</value>
<value>127</value>
<value>106</value>
<value>84</value>
</row>
</root>
The issue here is, this is just one of the collection items I have. There are other collection items, which have the same format, but possibly different number of columns, therefore I cannot just hard-code the column names into the select statement. Moreover some values can be empty. Example of other collection item can be found below:
<data>
<head>
<value>FriendlyName</value>
<value>DnsNameList</value>
<value>NotAfter</value>
<value>NotBefore</value>
<value>Thumbprint</value>
<value>Issuer</value>
</head>
<row>
<value />
<value />
<value>08/23/2021 07:59:59</value>
<value>08/23/2011 08:00:00</value>
<value>E1A7D7E3BD6CA0C832182248EF7F092A72F9EB67</value>
<value>CN=Hewlett-Packard Private Class 2 Certification Authority, O=Hewlett-Packard Company, C=US, OU=IT Infrastructure, O=hp.com</value>
</row>
<row>
<value>Microsoft Root Certificate Authority</value>
<value />
<value>05/10/2021 07:28:13</value>
<value>05/10/2001 07:19:22</value>
<value>CDD4EEAE6000AC7F40C3802C171E30148030C072</value>
<value>CN=Microsoft Root Certificate Authority, DC=microsoft, DC=com</value>
</row>
<row>
<value>Thawte Timestamping CA</value>
<value />
<value>01/01/2021 07:59:59</value>
<value>01/01/1997 08:00:00</value>
<value>BE36A4562FB2EE05DBB3D32323ADF445084ED656</value>
<value>CN=Thawte Timestamping CA, OU=Thawte Certification, O=Thawte, L=Durbanville, S=Western Cape, C=ZA</value>
</row>
<row>
<value>Microsoft Root Authority</value>
<value />
<value>12/31/2020 15:00:00</value>
<value>01/10/1997 15:00:00</value>
<value>A43489159A520F0D93D032CCAF37E7FE20A8B419</value>
<value>CN=Microsoft Root Authority, OU=Microsoft Corporation, OU=Copyright (c) 1997 Microsoft Corp.</value>
</row>
</data>
I need be able to parse data with this structure into SQL table to be able to work with it in the future. The expected result would be SQL table, looking something like this:
DeviceID VolumeName SizeGB FreeSpaceGB FreeSpacePercent
-------- --------------- ------ ----------- ----------------
C: c$OS-COMPUTER01 126 43 34
D: D$DB-COMPUTER01 127 104 82
E: E$COMPUTER01 127 106 84
Currently I parse the head data and row data into temp tables and iterating over the respective tables building dynamic SQL code and then executing it to create another table and insert parsed data into it.
--code snippet which parses the rows
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) n, xx.value('.', 'VARCHAR(MAX)') v INTO #rows
FROM (values(@x)) t1(x)
CROSS APPLY x.nodes('//row/value') t2(xx)
...
--code snippet building the INSERT queries
WHILE (@loopcounter * @columns) < @rows
BEGIN
DECLARE @insertstatement VARCHAR(MAX)
SET @insertstatement = 'INSERT INTO #result VALUES ('
DECLARE @forcounter INT = 1
WHILE @forcounter <= @columns
BEGIN
DECLARE @rownumber int = (@loopcounter * @columns) + @forcounter
SELECT @insertstatement += '''' + REPLACE(v, '''', '''''') + ''', ' FROM #rows WHERE n = @rownumber
SET @forcounter += 1
END
SET @insertstatement = SUBSTRING(@insertstatement, 1, LEN(@insertstatement) - 1)
SET @insertstatement += ')'
EXEC (@insertstatement)
SET @loopcounter += 1
END
...
I have searched this forum, but have not found solution, which would work dynamically with column names stored in such way. Could you please advise what can be done?
Upvotes: 0
Views: 2049
Reputation: 67321
The following query will retrieve all data in a non-pivoted format:
DECLARE @xml XML=
'<root>
<head>
<value>DeviceID</value>
<value>VolumeName</value>
<value>SizeGB</value>
<value>FreeSpaceGB</value>
<value>FreeSpacePercent</value>
</head>
<row>
<value>C:</value>
<value>c$OS-COMPUTER01</value>
<value>126</value>
<value>43</value>
<value>34</value>
</row>
<row>
<value>D:</value>
<value>D$DB-COMPUTER01</value>
<value>127</value>
<value>104</value>
<value>82</value>
</row>
<row>
<value>E:</value>
<value>E$COMPUTER01</value>
<value>127</value>
<value>106</value>
<value>84</value>
</row>
</root>';
--the query
WITH ColumnNames AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ColumnInx
,c.value(N'text()[1]','nvarchar(max)') AS ColumnName
FROM @xml.nodes(N'/root/head/value') AS A(c)
)
,TheRows AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowInx
,r.query(N'value') AS TheValues
FROM @xml.nodes(N'/root/row') AS B(r)
)
SELECT r.RowInx
,cn.ColumnInx
,cn.ColumnName
,r.TheValues.value(N'value[sql:column("ColumnInx")][1]','nvarchar(max)') AS row_value
FROM TheRows AS r
CROSS JOIN ColumnNames AS cn;
The result
+--------+-----------+------------------+-----------------+
| RowInx | ColumnInx | ColumnName | row_value |
+--------+-----------+------------------+-----------------+
| 1 | 1 | DeviceID | C: |
+--------+-----------+------------------+-----------------+
| 1 | 2 | VolumeName | c$OS-COMPUTER01 |
+--------+-----------+------------------+-----------------+
| 1 | 3 | SizeGB | 126 |
+--------+-----------+------------------+-----------------+
| 1 | 4 | FreeSpaceGB | 43 |
+--------+-----------+------------------+-----------------+
| 1 | 5 | FreeSpacePercent | 34 |
+--------+-----------+------------------+-----------------+
| 2 | 1 | DeviceID | D: |
+--------+-----------+------------------+-----------------+
| 2 | 2 | VolumeName | D$DB-COMPUTER01 |
+--------+-----------+------------------+-----------------+
| 2 | 3 | SizeGB | 127 |
+--------+-----------+------------------+-----------------+
| 2 | 4 | FreeSpaceGB | 104 |
+--------+-----------+------------------+-----------------+
| 2 | 5 | FreeSpacePercent | 82 |
+--------+-----------+------------------+-----------------+
| 3 | 1 | DeviceID | E: |
+--------+-----------+------------------+-----------------+
| 3 | 2 | VolumeName | E$COMPUTER01 |
+--------+-----------+------------------+-----------------+
| 3 | 3 | SizeGB | 127 |
+--------+-----------+------------------+-----------------+
| 3 | 4 | FreeSpaceGB | 106 |
+--------+-----------+------------------+-----------------+
| 3 | 5 | FreeSpacePercent | 84 |
+--------+-----------+------------------+-----------------+
If you need your table format this is done with a dynamically created PIVOT
command. Here is one example how to do that.
Upvotes: 1