Reputation: 593
I wanted to migrate my XslCompiledTransform to Saxon 9.7.0.6 HE because of XPath 2.0/XSLT 2.0, but it is way slower than .NET.
I tested each version with a default copy ident XSLT and 15.000 xml files:
Saxon with Parallel.ForEach: 00:05:02.9013605
XslCompiledTransform with Parallel.ForEach: 00:00:15.6724146
Saxon with foreach: 00:10:09.7763861
XslCompiledTransform with foreach: 00:03:00.3483324
I hope I do something wrong, XslCompiledTransform:
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(xsl);
XmlWriterSettings writerSettings = xslt.OutputSettings.Clone();
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.DtdProcessing = DtdProcessing.Ignore;
readerSettings.XmlResolver = null;
Parallel.ForEach(files, file =>
{
string target = Path.Combine(output, Path.GetFileName(file));
using (XmlReader xr = XmlReader.Create(file, readerSettings))
using (XmlWriter xw = XmlWriter.Create(target, writerSettings))
xslt.Transform(xr, xw);
});
The Saxon Version:
Processor processor = new Processor();
DocumentBuilder docBuilder = processor.NewDocumentBuilder();
docBuilder.DtdValidation = false;
docBuilder.SchemaValidationMode = SchemaValidationMode.None;
docBuilder.WhitespacePolicy = WhitespacePolicy.PreserveAll;
XsltCompiler compiler = processor.NewXsltCompiler();
XsltExecutable executable = compiler.Compile(new Uri(xsl));
Parallel.ForEach(files, file =>
{
string target = Path.Combine(output, Path.GetFileName(file));
XsltTransformer transformer = executable.Load();
XdmNode input = docBuilder.Build(new Uri(file));
transformer.InitialContextNode = input;
Serializer serializer = new Serializer();
serializer.SetOutputFile(target);
transformer.Run(serializer);
});
Update
I did another test without Visual Studio debugging and it got a lot better:
Saxon: 00:00:41.5990128
XslCompiledTransform: 00:00:19.0441044
So the main slow down was the debugger itself, but only for Saxon. Now it only takes twice the time of the .NET version, it is not super great, but I think I can go with that.
Is there anything I can do to make Saxon faster? Maybe play with the code or using EE instead of HE?
Here are some detailed benchmark information, the main performance problem is the DocumentBuilder.Build method. But even the transform itself is more than twice as slow as the .NET version:
Saxon:
.NET:
Upvotes: 2
Views: 830
Reputation: 593
I did the test with DocumentBuilder.Build(XmlReader) for Saxon and executed both tests.
Console.WriteLine("Saxon:");
for (int i = 0; i < 3; i++)
{
sw.Reset();
sw.Start();
Parallel.ForEach(files, file =>
{
string target = Path.Combine(output, Path.GetFileName(file));
XsltTransformer transformer = executable.Load();
XdmNode input = null;
using (XmlReader xr = XmlReader.Create(file, readerSettings))
input = docBuilder.Build(xr);
transformer.InitialContextNode = input;
Serializer serializer = new Serializer();
serializer.SetOutputFile(target);
transformer.Run(serializer);
});
sw.Stop();
Console.WriteLine("Duration: " + sw.Elapsed);
RemoveFiles(output);
}
and
Console.WriteLine("XslCompiledTransform:");
for (int i = 0; i < 3; i++)
{
sw.Reset();
sw.Start();
Parallel.ForEach(files, file =>
{
string target = Path.Combine(output, Path.GetFileName(file));
using (XmlReader xr = XmlReader.Create(file, readerSettings))
using (XmlWriter xw = XmlWriter.Create(target, writerSettings))
xslt.Transform(xr, xw);
});
sw.Stop();
Console.WriteLine("Duration: " + sw.Elapsed);
RemoveFiles(output);
}
The results are:
Saxon: 210.679ms
XslCompiledTransform: 179.129ms
I think this is a great result, the Saxon version only needs 17,61% more time than the XslCompiledTransform version. I can use XPath 2.0 and Xslt 2.0 and only have a performance loss of less than 20%.
Saxon:
XslCompiledTransform:
Upvotes: 0
Reputation: 163468
With performance, the devil is always in the detail. This sounds like a scenario that is worth doing some detailed study, so if you can supply us (Saxonica) with everything we need to run it, we'll be happy to take a look.
The first thing that's noticeable from your numbers is that the MS processor gets a much bigger speed-up from parallelizing than Saxon does. That could be because of NamePool contention: we've done a lot to reduce NamePool contention over recent releases, but that's for "typical workloads", and we would need to examine, for example, whether your documents are all using the same vocabulary of names.
The first thing I would want to establish is how much of the cost is document building and how much is transformation. Depending on the answer, subsequent investigation will take a completely different course. (Serialization cost for the result tree could also be a factor, but that would be unusual.)
The .NET version of Saxon is known to be significantly slower than the Java version. Years ago there used to be an overhead of about 30%, but this seems to have increased so it is now 3-5 times slower, and despite considerable efforts, we haven't managed to work out why. We're very dependent here on the IKVMC cross-compiler technology and the OpenJDK library.
Upvotes: 1