drew
drew

Reputation: 127

How to encode XML in T SQL without the additional XML overhead

I have a database which (For whatever reason) has a column containing pipe delimited data.

I want to parse this data quickly, so I've thought of converting this column (nvarchar) into an XML by replacing the pipes with XML attributes and putting it into an XML data typed column somewhere else.

It works, except in the case where that column had a character that required encoding, such a '<' character.

I found I could encode XML using FOR XML clause, however, that appears to inject some XML tags around the data.

For example: (this gives error on bad character)

SELECT CAST('<f>' + replace(value,'|','</f><f>') + '</f>' AS XML)
FROM TABLE

this gives xml encoded value, but wraps it in "< value> < /value>" tag

  SELECT value
   FROM table
  FOR XML PATH('')

Any ideas on how I can get the XML encoded value without this extra tag added, so I can convert the pipe format to XML after it's done (preferably in one swoop)?

EDIT: since people are asking, this is what 5 potential rows of data might look like

foo
foo|bar
foo|bar|1
foo||
baz|

And the results would be

Col1, Col2, Col3
foo,null,null
foo,bar,null
foo,bar,1
foo,null,null
baz,null,null

I'm achieving this by using the resulting XML type in a sub query such as: (it can be up to 4 columns pr 3 pipes in any given row)

    SELECT 
          *,
         x.query('f[1]').value('.','nVarChar(2048)')  Col1
        ,x.query('f[2]').value('.','nVarChar(2048)')  Col2
        ,x.query('f[3]').value('.','nvarchar(2048)')  Col3
        ,x.query('f[4]').value('.','nvarchar(2048)')  Col4
    FROM
    (
        SELECT *,
        CAST('<f>' + REPLACE(Value,'|','</f><f>') + '</f>' AS XML) as x
     FROM  table
    ) y

@srutzky makes a great point. No, I do not need to do XML here at all. If I can find a fast & clean way to parse pipes in a set based operation, I'll do that. Will review the SQL# documentation...

Upvotes: 3

Views: 3554

Answers (3)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67291

Your idea was absolutely OK: By making an XML out of your string the XML engine will convert all special characters properly. After your splitting the XML should be correct.

If your string is stored in a column you can avoid the automatically given name by either doing kind of computation (something like '' + YourColumn) or you give the column an alias AS [*]:

Try it like this:

DECLARE @str VARCHAR(100)='300|2€&ÄÖÜ|This is text -> should be text|2015-12-31';

SELECT @str FOR XML PATH('');
/*
300|2€&amp;ÄÖÜ|This is text -&gt; should be text|2015-12-31
*/

DECLARE @Xml XML=(SELECT CAST('<x>' + REPLACE((SELECT @str FOR XML PATH('')),'|','</x><x>')+'</x>' AS XML));

SELECT @Xml.value('/x[1]','int') AS IntTypeSave
      ,@Xml.value('/x[3]','varchar(max)') AS VarcharTypeSave
      ,@Xml.value('/x[4]','datetime') AS DateTypeSave;

/*
300           This is text -> should be text          2015-12-31 00:00:00.000
*/

SELECT X.value('.','varchar(max)') AS EachX
FROM @Xml.nodes('/x') AS Each(X);

/*
300
2€&ÄÖÜ
This is text -> should be text
2015-12-31
*/

Upvotes: 1

Dave
Dave

Reputation: 740

You could try the following BUT you need to make sure the content is "xml safe", in other words the content does not contain values which xml will reject (look into xml element content parsing). Try the following...it's test script to see if it does what you want..

UPDATE:

ok, it might help if I read the question all the way through...2 steps...split the pipes and then xml all the split items...try this:

Create the following function:

CREATE FUNCTION [dbo].[udf_SPLIT]
(
  @s nvarchar(max),
  @trimPieces bit,
  @returnEmptyStrings bit,
  @delimiter nvarchar(10)
)
RETURNS @t TABLE (val nvarchar(max))
AS
BEGIN
    DECLARE @i int, @j int

    SELECT @i = 0, @j = (LEN(@s) - LEN(REPLACE(@s,@delimiter,'')))

    ;WITH cte AS
    (
        SELECT i = @i + 1,
            s = @s,
            n = substring(@s, 0, charindex(@delimiter, @s)),
            m = substring(@s, charindex(@delimiter, @s)+1, len(@s) - charindex(@delimiter, @s))
            UNION ALL
        SELECT i = cte.i + 1,
            s = cte.m, 
            n = substring(cte.m, 0, charindex(@delimiter, cte.m)),
            m = substring(cte.m, charindex(@delimiter, cte.m) + 1, len(cte.m)-charindex(@delimiter, cte.m))
        FROM cte
        WHERE i <= @j
    )
    INSERT INTO @t (val)
        SELECT [pieces]
        FROM (
            SELECT CASE 
                WHEN @trimPieces = 1 THEN LTRIM(RTRIM(CASE WHEN i <= @j THEN n ELSE m END))
                ELSE CASE WHEN i <= @j THEN n ELSE m END
                END AS [pieces]
            FROM cte
            ) t
        WHERE (@returnEmptyStrings = 0 AND LEN(pieces) > 0)
        OR (@returnEmptyStrings = 1)
        OPTION (maxrecursion 0)

RETURN

END

next try the following to test...

DECLARE @str nvarchar(500) = 'test|<html>this</html>|boogie woogie| SDGDSFG| game<br /> on |working| this|'

SELECT REPLACE(
        REPLACE(
            REPLACE(
                REPLACE([val],'&','&amp;')
                ,'"','&quot;')
            ,'<','&lt;')
        ,'>','&gt;')
     AS [f]
FROM [dbo].[udf_SPLIT](@str,1,0,'|')
FOR XML PATH('')

If not totally correct, hopefully will put you on right path...

HTH

Dave

Upvotes: 1

Bruce Dunwiddie
Bruce Dunwiddie

Reputation: 2908

SELECT CAST('<values><f>' + 
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(value,'&','&amp;')
                    ,'"','&quot;')
                ,'<','&lt;')
            ,'>','&gt;')
        ,'|','</f><f>') + '</f></values>' AS XML)
FROM TABLE;

Upvotes: 1

Related Questions