Dan
Dan

Reputation: 773

Efficiency of node removal vba DOM

I have a set of approximately 17,000 nodes. I want to select a random number, then remove the node with that random index number. At the moment what I have works. It is:

'some code to create a DOM as "xDoc",
'create a node object as "node",
'get total nodes as variable "total",
'get paths to a node as "xPath"

'Loop to reduce nodes to 1000
Do While total > 1000
    'some code to get a random number as variable "num"

    'initialize node object
    Set node = xDoc.SelectSingleNode(xPath & "[" & CStr(num) & "]”)

    'remove the node
    node.ParentNode.RemoveChild node

Loop

This takes too long. Close to an hour and a half and in the future my xml is going to grow exponentially. It runs, removes nodes correctly just sloooowwwwwwww. I was thinking there must be some way to create a nodelist and add to it based on the random numbers then select the final nodelist from the doc to delete all at once to make it quicker. Does this make sense? Or maybe there’s a quicker more efficient way than I’m thinking?

I apologize my comment lines are not appearing as comments...I hope they don’t make things confusing.

Upvotes: 0

Views: 146

Answers (2)

Parfait
Parfait

Reputation: 107767

Consider XSLT 1.0 which can be run in Excel VBA using its MSXML library. Sibling to XPath, XSLT is a special-purpose language designed to transform XML files like removing nodes. Various softwares (SAS, Excel, etc.) and programming languages (Java, Python, etc.) usually carry XSLT 1.0 libraries. To use XSLT 2.0 and 3.0 you will need to run the transformation externally possibly command line calls to dedicated processors such as Saxon or Exslt.

Specifically, XSLT receives your random number into a parameter and then runs the Identity Transform to copy input XML as is. At end, the specific node is removed with empty template by position(). Be sure to change mynode to actual node name (no full XPath).

XSLT (save as .xsl, a special xml file)

<?xml version="1.0" ?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">  
  <xsl:output method="xml" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="rand_num" />

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
     <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
     </xsl:copy>
  </xsl:template> 

  <!-- Remove Node by Position --> 
  <xsl:template match="mynode[position()=$rand_num]"/>

</xsl:stylesheet>

VBA

Public Sub RunXSLT()
    ' REFERENCE Microsoft XML, v3.0 OR v6.0
    Dim xmlDoc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60

    'some code to get a random number as variable "num"

    ' LOAD INPUT XML
    xmlDoc.Load "C:\Path\To\Input.xml"
    xmlDoc.async = False

    ' LOAD XSLT SCRIPT AND PASS RANDOM NUMBER
    xslDoc.Load "C:\Path\To\XSLTScript.xsl"
    xslDoc.async = False
    xslDoc.addParameter "rand_num", num  

    ' TRANSFORM INPUT AND SAVE OUTPUT TO FILE
    xmlDoc.transformNodeToObject xslDoc, newDoc
    newDoc.Save "C:\Path\To\Output.xml"

    Set newDoc = Nothing: Set xslDoc = Nothing: Set xmlDoc = Nothing

End Sub

Upvotes: 1

Michael Kay
Michael Kay

Reputation: 163675

It's possible that most of the cost is in compiling the XPath expressions rather than actually deleting the nodes. Try an approach where you compile an XPath expression with a parameter once, and then execute it repeatedly with different parameters.

But I think a better approach is this, in XSLT 3.0:

<xsl:variable name="all" select="/*/*"/>
<xsl:variable name="retain" 
  select="random-number-generator()?permute($all)[position() le 1000]"/>
<xsl:copy-of select="$retain"/>

Upvotes: 2

Related Questions