Why is there a Sort in execution plan when using XQuery

I have a table with 10,000,000 rows in the SQL Server

CREATE TABLE [dbo].[tmpTable](
    [EventId] [int] NULL,
    [Data] [nvarchar](max) NULL
) 
CREATE NONCLUSTERED INDEX [ci] ON [dbo].[tmpTable]
(
    [EventId] ASC
)
INCLUDE ([Data])

the Data Column is XML

when i used the SQL,

    SELECT  EventId
  , CAST(Data AS xml ).value('(/d/nv/@v)[1]', 'uniqueidentifier') AS ID1
  , CAST(Data AS xml ).value('(/d/nv/@v)[2]', 'int') AS ID2
  , CAST(Data AS xml ).value('(/d/nv/@v)[3]', 'bigint') AS ID3
  , CAST(Data AS xml ).value('(/d/nv/@v)[4]', 'bit') AS ID4
  , CAST(Data AS xml ).value('(/d/nv/@v)[5]', 'nvarchar(100)') AS ID5
  , CAST(Data AS xml ).value('(/d/nv/@v)[6]', 'nvarchar(100)') AS ID6
  , CAST(Data AS xml ).value('(/d/nv/@v)[7]', 'bigint') AS ID7
  , CAST(Data AS xml ).value('(/d/nv/@v)[8]', 'int') AS ID8
    FROM tmpTable
    WHERE EventId = 100

there will be Sort in the execution plan? which kill the performance. but, if i use

Select Top 100

the sort will gone. any idea? and is there any way to remove the sort

Execution plan:

|--Compute Scalar(DEFINE:([Expr1011]=[Expr1010], [Expr1022]=[Expr1021], [Expr1033]=[Expr1032], [Expr1044]=[Expr1043], [Expr1055]=[Expr1054], [Expr1066]=[Expr1065], [Expr1077]=[Expr1076], [Expr1088]=[Expr1087]))
       |--Parallelism(Gather Streams)
            |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |    |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |    |    |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |    |    |    |    |    |    |--Sort(ORDER BY:([tmpTable].[dbo].[tmptmpTable].[Data] ASC))
                 |    |    |    |    |    |    |    |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([tmpTable].[dbo].[tmptmpTable].[Data]))
                 |    |    |    |    |    |    |    |         |--Table Scan(OBJECT:([tmpTable].[dbo].[tmptmpTable]), WHERE:([tmpTable].[dbo].[tmptmpTable].[EventId]=(100)))

Sample data

<d>
  <nv n="MediaDesc" v="79cc07e3-8d4a-4c8a-bc9f-3fcba485532b" />
  <nv n="ChannelNumber" v="116" />
  <nv n="Duration" v="61773" />
  <nv n="IsTunedToService" v="True" />
  <nv n="StreamSelection" v="FULLSCREEN_SECONDARY" />
  <nv n="ChannelType" v="LiveTVMediaChannel" />
  <nv n="TuneID" v="634050840267464082" />
</d>

Upvotes: 3

Answers (2)

Mikael Eriksson

Reputation: 138960

You will get better performance if you change the datatype of your column to XML instead of casting to XML.

Casting to XML is an expensive operation. SQL Server does its best to give you the result you need as fast as possible and in this case it looks like it uses a table spool to get the work done. The spool is not visible in the plan you have provided but I believe it is there down in each branch inside the nested loops join.

Some code that reproduces what you see.

create table T(S nvarchar(max) null);

go

insert into T(S) 
select top(50) '<N>1</N><N>2</N>'
from sys.columns;

insert into T(S) 
select top(50) '<N>3</N><N>4</N>'
from sys.columns;

Your query:

select cast(T.S as xml).value('(N/text())[2]', 'int')
from T;

Query plan:

enter image description here

The lazy spool caches the last result set generated and as long as the value from the outer part is the same, the spool can return the cached result set (rewind). But when the current input value is different from the previous input value the spool has to generate the result for the new value (rebind). The optimizer adds the sort operator to minimize the number of rebinds.

You can use trace flag (8690) to remove the optimization that creates a plan with a sort and a spool. It is not documented by Microsoft so don't use it in production but you can use it in tests to see what effects the optimization has in your system.

select cast(T.S as xml).value('(N/text())[2]', 'int')
from T
option (querytraceon 8690);

More info on what the Lazy Spool is and how it works can be found in this answer by Paul White on SQLPerformance.com.

Upvotes: 3

Rhys Jones

Reputation: 5518

Looks like the SORT is related to the parrallelisation. Add OPTION (MAXDOP 1) to prevent parallel processing. Also, if EventId is unique then make the index unique.

EDIT

There's no clustered index on the table - SORT could be connected to it being a HEAP. Make the index [ci] clustered.

Upvotes: 0

Why is there a Sort in execution plan when using XQuery

Answers (2)

Related Questions