BRW
BRW

Reputation: 199

How can I speed up loading of XML documents from disk?

I'm trying to speed up a VB6 XML parser. The XML files to be parsed are stored on my local hard drive. The If xDOC.Load(objFile.Path) Then statement below is taking a very long time, according to my profiling results. That statement took 34.5 seconds overall during processing of a small batch of 100 XML files. A sample XML file is here. Can this code be improved to speed up XML file loading, or is the loading speed constrained by the nature of the XML files themselves?

Option Explicit

Dim objFSO As Object
Dim objFolder As Object
Dim objFile As Object

Dim xDOC As MSXML2.DOMDocument

Dim xPE As MSXML2.IXMLDOMParseError

Sub Main()

Set xDOC = New DOMDocument

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objFolder = objFSO.GetFolder("C:\My XML File Folder")

For Each objFile In objFolder.Files
    Set xDOC = New DOMDocument
    xDOC.async = False

    If xDOC.Load(objFile.Path) Then
        ' process the file
    Else
        ' XML file failed to load; log error and continue with next file
    End If

   Set xDOC = Nothing

Next objFile

Set objFolder = Nothing

End Sub

Upvotes: 3

Views: 1446

Answers (2)

William Walseth
William Walseth

Reputation: 2923

Run this with the following command in windows

cscript.exe testperf.js testfile.xml 1000

Here's the testperf.js

var aArguments = WScript.Arguments;
var xmlDoc;
var xslDoc;

function loadXMLFile( strFileName ) {
    var xml = new ActiveXObject("MSXML2.FreeThreadedDOMDocument");
    xml.setProperty("SelectionNamespaces", "xmlns:ms='urn:schemas-microsoft-com:xslt'");
    xml.validateOnParse = false;
    xml.resolveExternals = false;
    xml.preserveWhiteSpace = false;
    if( !xml.load( strFileName ) ) {
        var strError = "";
        var facility = xml.parseError.errorCode>>16 & 0x1FFF;
        var code = xml.parseError.errorCode & 0xFFFF;
        strError = 'Error loading: ' + strFileName + '\r\n';
        strError += xml.parseError.reason;
        strError += "Facility: " + facility + " Code: " + code + "\r\n";
        strError += xml.parseError.srcText + "\r\n";
        strError += xml.parseError.url + "\r\n";
        strError += "Line: " + xml.parseError.line +  " Postion: " + xml.parseError.linepos + "\r\n";
        throw new Error( xml.parseError.errorCode, strError );
    }
    return xml;
}

try {
    if( aArguments.length < 2 ) {
        WScript.Echo( "Usage: testperf file.xml loadcount" );
        WScript.Quit( 1 );
    }
    var strStatus = 'Loading XML';

    var dtStart = new Date().valueOf();
    var nLoop = parseInt( aArguments(1) );
    for( i = 0; i < nLoop; i++ ) {
        xmlDoc = loadXMLFile( aArguments(0) );
    }
    var dtStop = new Date().valueOf();
    WScript.Echo( nLoop + " XML loads took " + parseFloat( (dtStop - dtStart) / 1000 ).toFixed( 2 ) + " seconds.");

}
catch( e ) {
    WScript.Echo( 'Error in file:' + aArguments(1) + '\n' + e.number + " " + e.description );
    WScript.Quit( 1 );
}

Upvotes: 1

William Walseth
William Walseth

Reputation: 2923

Your XML is fine, actually it's quite small, and loads very quickly, with the proper document settings.

I did notice the DTD, which is re-downloaded from http://patents.ic.gc.ca/cipo/dtd/ca-patent-document-v2-0.dtd every time you download a file. Moreover, the DTD itself embeds other DTD files, so you're likely downloading them too.

MSXML XML does a lot of extra stuff by default, but if your XML is known to be "good", then the fastest way to load it is to set the following values to false before calling Load(). This way you're only validating that the XML is well formed.

var doc = new ActiveXObject("MSXML2.DOMDocument");
doc.validateOnParse = false;  // don't validate
doc.resolveExternals = false; // don't even download external files (DTDs...)
doc.preserveWhiteSpace = false; // don't try to preserve formatting.
doc.load("somexml.xml");

Hope this helps you out, and you can translate it over the VB6

Upvotes: 4

Related Questions