Reputation: 31
I have a table with 10,000,000 rows in the SQL Server
CREATE TABLE [dbo].[tmpTable](
[EventId] [int] NULL,
[Data] [nvarchar](max) NULL
)
CREATE NONCLUSTERED INDEX [ci] ON [dbo].[tmpTable]
(
[EventId] ASC
)
INCLUDE ([Data])
the Data Column is XML
when i used the SQL,
SELECT EventId
, CAST(Data AS xml ).value('(/d/nv/@v)[1]', 'uniqueidentifier') AS ID1
, CAST(Data AS xml ).value('(/d/nv/@v)[2]', 'int') AS ID2
, CAST(Data AS xml ).value('(/d/nv/@v)[3]', 'bigint') AS ID3
, CAST(Data AS xml ).value('(/d/nv/@v)[4]', 'bit') AS ID4
, CAST(Data AS xml ).value('(/d/nv/@v)[5]', 'nvarchar(100)') AS ID5
, CAST(Data AS xml ).value('(/d/nv/@v)[6]', 'nvarchar(100)') AS ID6
, CAST(Data AS xml ).value('(/d/nv/@v)[7]', 'bigint') AS ID7
, CAST(Data AS xml ).value('(/d/nv/@v)[8]', 'int') AS ID8
FROM tmpTable
WHERE EventId = 100
there will be Sort in the execution plan? which kill the performance. but, if i use
Select Top 100
the sort will gone. any idea? and is there any way to remove the sort
Execution plan:
|--Compute Scalar(DEFINE:([Expr1011]=[Expr1010], [Expr1022]=[Expr1021], [Expr1033]=[Expr1032], [Expr1044]=[Expr1043], [Expr1055]=[Expr1054], [Expr1066]=[Expr1065], [Expr1077]=[Expr1076], [Expr1088]=[Expr1087]))
|--Parallelism(Gather Streams)
|--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
| |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
| | |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
| | | |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
| | | | |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
| | | | | |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
| | | | | | |--Nested Loops(Inner Join, OUTER REFERENCES:([tmpTable].[dbo].[tmptmpTable].[Data]))
| | | | | | | |--Sort(ORDER BY:([tmpTable].[dbo].[tmptmpTable].[Data] ASC))
| | | | | | | | |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([tmpTable].[dbo].[tmptmpTable].[Data]))
| | | | | | | | |--Table Scan(OBJECT:([tmpTable].[dbo].[tmptmpTable]), WHERE:([tmpTable].[dbo].[tmptmpTable].[EventId]=(100)))
Sample data
<d>
<nv n="MediaDesc" v="79cc07e3-8d4a-4c8a-bc9f-3fcba485532b" />
<nv n="ChannelNumber" v="116" />
<nv n="Duration" v="61773" />
<nv n="IsTunedToService" v="True" />
<nv n="StreamSelection" v="FULLSCREEN_SECONDARY" />
<nv n="ChannelType" v="LiveTVMediaChannel" />
<nv n="TuneID" v="634050840267464082" />
</d>
Upvotes: 3
Views: 213
Reputation: 138960
You will get better performance if you change the datatype of your column to XML instead of casting to XML.
Casting to XML is an expensive operation. SQL Server does its best to give you the result you need as fast as possible and in this case it looks like it uses a table spool to get the work done. The spool is not visible in the plan you have provided but I believe it is there down in each branch inside the nested loops join.
Some code that reproduces what you see.
create table T(S nvarchar(max) null);
go
insert into T(S)
select top(50) '<N>1</N><N>2</N>'
from sys.columns;
insert into T(S)
select top(50) '<N>3</N><N>4</N>'
from sys.columns;
Your query:
select cast(T.S as xml).value('(N/text())[2]', 'int')
from T;
Query plan:
The lazy spool caches the last result set generated and as long as the value from the outer part is the same, the spool can return the cached result set (rewind). But when the current input value is different from the previous input value the spool has to generate the result for the new value (rebind). The optimizer adds the sort operator to minimize the number of rebinds.
You can use trace flag (8690) to remove the optimization that creates a plan with a sort and a spool. It is not documented by Microsoft so don't use it in production but you can use it in tests to see what effects the optimization has in your system.
select cast(T.S as xml).value('(N/text())[2]', 'int')
from T
option (querytraceon 8690);
More info on what the Lazy Spool is and how it works can be found in this answer by Paul White on SQLPerformance.com.
Upvotes: 3
Reputation: 5518
Looks like the SORT is related to the parrallelisation. Add OPTION (MAXDOP 1)
to prevent parallel processing. Also, if EventId is unique then make the index unique.
EDIT
There's no clustered index on the table - SORT could be connected to it being a HEAP. Make the index [ci] clustered.
Upvotes: 0