Reputation: 773
I have a set of approximately 17,000 nodes. I want to select a random number, then remove the node with that random index number. At the moment what I have works. It is:
'some code to create a DOM as "xDoc",
'create a node object as "node",
'get total nodes as variable "total",
'get paths to a node as "xPath"
'Loop to reduce nodes to 1000
Do While total > 1000
'some code to get a random number as variable "num"
'initialize node object
Set node = xDoc.SelectSingleNode(xPath & "[" & CStr(num) & "]”)
'remove the node
node.ParentNode.RemoveChild node
Loop
This takes too long. Close to an hour and a half and in the future my xml is going to grow exponentially. It runs, removes nodes correctly just sloooowwwwwwww. I was thinking there must be some way to create a nodelist and add to it based on the random numbers then select the final nodelist from the doc to delete all at once to make it quicker. Does this make sense? Or maybe there’s a quicker more efficient way than I’m thinking?
I apologize my comment lines are not appearing as comments...I hope they don’t make things confusing.
Upvotes: 0
Views: 146
Reputation: 107767
Consider XSLT 1.0 which can be run in Excel VBA using its MSXML library. Sibling to XPath, XSLT is a special-purpose language designed to transform XML files like removing nodes. Various softwares (SAS, Excel, etc.) and programming languages (Java, Python, etc.) usually carry XSLT 1.0 libraries. To use XSLT 2.0 and 3.0 you will need to run the transformation externally possibly command line calls to dedicated processors such as Saxon or Exslt.
Specifically, XSLT receives your random number into a parameter and then runs the Identity Transform to copy input XML as is. At end, the specific node is removed with empty template by position()
. Be sure to change mynode to actual node name (no full XPath).
XSLT (save as .xsl, a special xml file)
<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="rand_num" />
<!-- Identity Transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Remove Node by Position -->
<xsl:template match="mynode[position()=$rand_num]"/>
</xsl:stylesheet>
VBA
Public Sub RunXSLT()
' REFERENCE Microsoft XML, v3.0 OR v6.0
Dim xmlDoc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60
'some code to get a random number as variable "num"
' LOAD INPUT XML
xmlDoc.Load "C:\Path\To\Input.xml"
xmlDoc.async = False
' LOAD XSLT SCRIPT AND PASS RANDOM NUMBER
xslDoc.Load "C:\Path\To\XSLTScript.xsl"
xslDoc.async = False
xslDoc.addParameter "rand_num", num
' TRANSFORM INPUT AND SAVE OUTPUT TO FILE
xmlDoc.transformNodeToObject xslDoc, newDoc
newDoc.Save "C:\Path\To\Output.xml"
Set newDoc = Nothing: Set xslDoc = Nothing: Set xmlDoc = Nothing
End Sub
Upvotes: 1
Reputation: 163675
It's possible that most of the cost is in compiling the XPath expressions rather than actually deleting the nodes. Try an approach where you compile an XPath expression with a parameter once, and then execute it repeatedly with different parameters.
But I think a better approach is this, in XSLT 3.0:
<xsl:variable name="all" select="/*/*"/>
<xsl:variable name="retain"
select="random-number-generator()?permute($all)[position() le 1000]"/>
<xsl:copy-of select="$retain"/>
Upvotes: 2