Puc
Puc

Reputation: 11

SQL Server to produce XML data rows from JOINed select statement

I have a three tables in SQL Server 2008 which are setup as follows:

EMPLOYEE TABLE

empid(PK)
1
2

joined to EMPLOYEEATTRIBUTES

dataId(PK) | empId(FK) | attributeid | attributeVal
10 | 1 | A1 | somevalue1
20 | 1 | A2 | somevalue2
30 | 2 | A1 | somevalue3
40 | 2 | A3 | somevalue4

joined to ATTRIBUTES

attributeid | attributeName
A1 | attribute1
A2 | attribute2
A3 | attribute3

I need to get the xml data out into the following format

<rows>
   <row empid="1">
     <attribute1>somevalue1</attribute1>
     <attribute2>somevalue2</attribute1>
   </row>
   <row empid="2">
     <attribute1>somevalue3</attribute1>
     <attribute3>somevalue4</attribute1>
   </row>
</rows>

Anyone know how this can be done??

Upvotes: 1

Views: 1642

Answers (3)

WReach
WReach

Reputation: 18271

If you want to skip all of the gory details and just see an answer, look at the SQL query at the bottom of this posting.

The main challenge here is that the various SQL Server FOR XML options cannot generate the dynamic element names stipulated in the desired output. Therefore, my first answer is to consider simply returning a conventional SQL result set and having the client generate the XML. It is a very simple streaming transformation. However, this might not be an option for you, so we continue on the path of having SQL Server generate the XML.

My second thought was to use SQL Server's built-in XQuery functionality to perform the transformation, thus:

/* WARNING: the following SQL does not work */
SELECT
  CAST((SELECT * FROM data FOR XML RAW) AS XML)
    .query('
      <rows>
        {
          for $empId in distinct-values(/row/@empId)
          return
            <row empid="{$empId}">
            {
              for $attr in /row[@empId = $empId]
              return
                attribute { "attribute" } { $attr/@attributeValue }
            }
            </row>
        }
      </rows>
    ')

Alas, this does not work. SQL Server complains:

Msg 9315, Level 16, State 1, Line 25
XQuery [query()]: Only constant expressions are supported for the name expression
of computed element and attribute constructors.

Apparently, the XQuery implementation suffers from the same limitation as the FOR XML features. So, my second answer is to suggest generating the XML on the client side :) But if you insist on generating the XML from SQL, then fasten your seatbelts...

The overall strategy is going to be to abandon SQL Server's native facilities for SQL generation. Instead, we are going to build up the XML document using string concatenation. If this approach is offensive, you can stop reading now :)

Let's start with generating a sample dataset to play with:

SELECT NULL AS empId INTO employee WHERE 1=0
UNION SELECT 1
UNION SELECT 2

SELECT NULL AS dataId, NULL AS empId, NULL AS attributeId, NULL AS attributeVal INTO employeeAttributes WHERE 1=0
UNION SELECT 10, 1, 'A1', 'someValue1'
UNION SELECT 20, 1, 'A2', 'someValue2'
UNION SELECT 30, 2, 'A1', 'someValue3'
UNION SELECT 40, 2, 'A3', 'someValue4 & <>!'

SELECT NULL AS attributeId, NULL AS attributeName INTO attributes WHERE 1=0
UNION SELECT 'A1', 'attribute1'
UNION SELECT 'A2', 'attribute2'
UNION SELECT 'A3', 'attribute3'

Note that I have changed the value of the last attribute in the provided example to include some XML-unfriendly characters.

Now, put together a basic SQL query to perform the necessary joins:

SELECT
  e.empId
, a.attributeName
, ea.attributeVal
FROM employee AS e
INNER JOIN employeeAttributes AS ea
  ON ea.empId = e.empId
INNER JOIN attributes AS a
  ON a.attributeId = ea.attributeId

which gives this result:

empId   attributeName   attributeVal
1       attribute1      someValue1
1       attribute2      someValue2
2       attribute1      someValue3
2       attribute3      someValue4 & <>!

Those funny characters in the last attribute are going to give us trouble. Let's change the query to escape them.

; WITH
  cruftyData AS (
    SELECT
      e.empId
      , a.attributeName
      , (SELECT ea.attributeVal AS x FOR XML RAW) AS attributeValXml
      FROM employee AS e
      INNER JOIN employeeAttributes AS ea
        ON ea.empId = e.empId
      INNER JOIN attributes AS a
        ON a.attributeId = ea.attributeId
    )
, data AS (
    SELECT
      empId
    , attributeName
    , SUBSTRING(attributeValXml, 9, LEN(attributeValXml)-11) AS attributeVal
    FROM cruftyData
  )
SELECT * FROM data

with results:

empId   attributeName   attributeValXml
1       attribute1      someValue1
1       attribute2      someValue2
2       attribute1      someValue3
2       attribute3      someValue4 &amp; &lt;&gt;!

This ensures that attribute values can now be safely used in an XML document. What about attribute names? The rules for XML attribute names are more strict than those for element content. We will assume that the attributes names are valid XML identifiers. If this is not true, then some scheme will need to be devised to convert the names in the database to valid XML names. This is left as an exercise to the reader :)

The next challenge is to ensure that the attributes are grouped together for each employee, and we can tell when we are at the first or last value in a group. Here is the updated query:

; WITH
  cruftyData AS (
    SELECT
      e.empId
      , a.attributeName
      , (SELECT ea.attributeVal AS x FOR XML RAW) AS attributeValXml
      FROM employee AS e
      INNER JOIN employeeAttributes AS ea
        ON ea.empId = e.empId
      INNER JOIN attributes AS a
        ON a.attributeId = ea.attributeId
    )
, data AS (
    SELECT
      empId
    , attributeName
    , SUBSTRING(attributeValXml, 9, LEN(attributeValXml)-11) AS attributeVal
    , ROW_NUMBER() OVER (PARTITION BY empId ORDER BY attributeName DESC) AS down
    , ROW_NUMBER() OVER (PARTITION BY empId ORDER BY attributeName) AS up
    FROM cruftyData
  )
SELECT * FROM data ORDER BY 1, 2

The only change is to add the down and up columns to the result set:

empId  attributeName   attributeVal                down  up
1      attribute1      someValue1                  2     1
1      attribute2      someValue2                  1     2
2      attribute1      someValue3                  2     1
2      attribute3      someValue4 &amp; &lt;&gt;!  1     2

We can now identify the first attribute for an employee because up will be 1. The last attribute can be identified in similar fashion using the down column.

Armed with all of this, we are now equipped to perform the nasty business of building up the XML result using string concatenation.

; WITH
  cruftyData AS (
    SELECT
      e.empId
      , a.attributeName
      , (SELECT ea.attributeVal AS x FOR XML RAW) AS attributeValXml
      FROM employee AS e
      INNER JOIN employeeAttributes AS ea
        ON ea.empId = e.empId
      INNER JOIN attributes AS a
        ON a.attributeId = ea.attributeId
    )
, data AS (
    SELECT
      empId
    , attributeName
    , SUBSTRING(attributeValXml, 9, LEN(attributeValXml)-11) AS attributeVal
    , ROW_NUMBER() OVER (PARTITION BY empId ORDER BY attributeName DESC) AS down
    , ROW_NUMBER() OVER (PARTITION BY empId ORDER BY attributeName) AS up
    FROM cruftyData
  )
, xmlData AS (
  SELECT
    empId
  , up
  , CASE WHEN up <> 1 THEN '' ELSE '<row id="' + CAST (empId AS NVARCHAR) + '">' END AS xml1
  , '<' + attributeName + '>' + attributeVal + '</' + attributeName + '>' AS xml2
  , CASE WHEN down <> 1 THEN '' ELSE '</row>' END AS xml3
  FROM data
)
SELECT xml1, xml2, xml3
--SELECT @result = @result + 'wombat' + xmlString
FROM xmlData
ORDER BY empId, up

with the result:

xml1          xml2                                                 xml3
<row id="1">  <attribute1>someValue1</attribute1>        
              <attribute2>someValue2</attribute2>                  </row>
<row id="2">  <attribute1>someValue3</attribute1>        
              <attribute3>someValue4 &amp; &lt;&gt;!</attribute3>  </row>

All that remains is to concatenate all of the rows together, and to add the root rows tags. Since T-SQL does not (yet) have a string concatenation aggregate function, we will resort to using a variable as an accumulator. Here is the final query, in all its hacky glory:

DECLARE @result AS NVARCHAR(MAX)
SELECT @result = '<rows>'

; WITH
  cruftyData AS (
    SELECT
      e.empId
      , a.attributeName
      , (SELECT ea.attributeVal AS x FOR XML RAW) AS attributeValXml
      FROM employee AS e
      INNER JOIN employeeAttributes AS ea
        ON ea.empId = e.empId
      INNER JOIN attributes AS a
        ON a.attributeId = ea.attributeId
    )
, data AS (
    SELECT
      empId
    , attributeName
    , SUBSTRING(attributeValXml, 9, LEN(attributeValXml)-11) AS attributeVal
    , ROW_NUMBER() OVER (PARTITION BY empId ORDER BY attributeName DESC) AS down
    , ROW_NUMBER() OVER (PARTITION BY empId ORDER BY attributeName) AS up
    FROM cruftyData
  )
, xmlData AS (
  SELECT
    empId
  , up
  , CASE WHEN up <> 1 THEN '' ELSE '<row id="' + CAST (empId AS NVARCHAR) + '">' END AS xml1
  , '<' + attributeName + '>' + attributeVal + '</' + attributeName + '>' AS xml2
  , CASE WHEN down <> 1 THEN '' ELSE '</row>' END AS xml3
  FROM data
)
SELECT @result = @result + xml1 + xml2 + xml3
FROM xmlData
ORDER BY empId, up

SELECT @result = @result + '</rows>'
SELECT @result

The XML ends up in the @result variable. You can check that it is well-formed XML using:

SELECT CAST(@result AS XML)

The final XML looks like this:

<rows><row id="1"><attribute1>someValue1</attribute1><attribute2>someValue2</attribute2></row><row id="2"><attribute1>someValue3</attribute1><attribute3>someValue4 &amp; &lt;&gt;!</attribute3></row></rows>

Upvotes: 2

Stuart Ainsworth
Stuart Ainsworth

Reputation: 12940

Here's an answer, but the PIVOT command limits you in that you have to know the name of your attributes in advance. With a little tweaking, you could probably do this dynamically (try searching for dynamic pivot in SQL Server 2005):

DECLARE @Employee TABLE ( empid INT )
DECLARE @EA TABLE
    (
      dataid INT
    , empid INT
    , attributeid CHAR(2)
    , AttributeVal VARCHAR(100)
    )
DECLARE @Attributes TABLE
    (
      AttributeID CHAR(2)
    , AttributeName VARCHAR(100)
    )

INSERT  INTO @Employee
VALUES  ( 1 ),
        ( 2 )

INSERT  INTO @EA
        ( dataid, empid, attributeid, AttributeVal )
VALUES  ( 10, 1, 'A1', 'somevalue1' )
    , ( 20, 1, 'A2', 'somevalue2' )
    , ( 30, 2, 'A1', 'somevalue3' )
    , ( 40, 2, 'A3', 'somevalue4' )

INSERT  INTO @Attributes
        ( AttributeID, AttributeName )
VALUES  ( 'A1', 'attribute1' )
        ,
        ( 'A2', 'attribute2' )
        ,
        ( 'A3', 'attribute3' )

SELECT  empID as '@empid'
      , attribute1
      , attribute2
      , attribute3
      , attribute4
FROM    ( SELECT    e.empid
                  , a.AttributeName
                  , ea.AttributeVal
          FROM      @Employee e
                    JOIN @EA ea ON e.empid = ea.empid
                    JOIN @Attributes a ON ea.attributeid = a.attributeid
        ) ps PIVOT
( MIN(AttributeVal) FOR AttributeName IN ( [attribute1], [attribute2], [attribute3], [attribute4] ) ) AS pvt    
FOR XML PATH('row'), ROOT('rows')

Upvotes: 0

marc_s
marc_s

Reputation: 754538

You can get close - but you can't get your desired output 100%.

Using this query:

SELECT
    EmpID AS '@empid',
    (
        SELECT 
           a.AttributeName AS '@name',
           ea.AttributeVal
        FROM dbo.EmployeeAttributes ea 
        INNER JOIN dbo.Attributes a ON ea.AttributeId = a.AttributeId
        WHERE ea.EmpID = e.EmpID
        FOR XML PATH ('attribute'), TYPE
    )
FROM dbo.Employee e
FOR XML PATH('row'), ROOT('rows')

you get this output:

<rows>
  <row empid="1">
    <attribute name="Attribute1">
      <AttributeVal>SomeValue1</AttributeVal>
    </attribute>
    <attribute name="attribute2">
      <AttributeVal>SomeValue2</AttributeVal>
    </attribute>
  </row>
  <row empid="2">
    <attribute name="Attribute1">
      <AttributeVal>SomeValue3</AttributeVal>
    </attribute>
    <attribute name="attribute3">
      <AttributeVal>SomeValue4</AttributeVal>
    </attribute>
  </row>
</rows>

What you cannot do is make the inner XML nodes have tag names that match the attribute name - you have to use some fixed tag name (like <attribute> in my sample), and then apply the values that are retrieved from your tables as either attributes on those XML tags (like the name= attribute in my sample) or as XML element values.

As far as I know, there is no way to use the AttributeValue as the XML tag name....

Upvotes: 1

Related Questions