Deleting XML Lines in VBScript

Question

I got the following piece of code in VBScript:

Set xmlDoc = CreateObject("Msxml2.DOMDocument.6.0")   
xmlDoc.Async = "False"
xmlDoc.setProperty "SelectionLanguage", "XPath"

For Each f In fso.GetFolder("C:\Users\Admin\Folder").Files
    If LCase(fso.GetExtensionName(f)) = "xml" Then
    xmlDoc.Load f.Path

        If xmlDoc.ParseError = 0 Then

            'Some code in here

        Else
            WScript.Echo "Parsing error! '" & f.Path & "': " & xmlDoc.ParseError.Reason

        End If
    End If
Next

I'm doing some operations to XML files inside that directory, but i need to do one thing with all these XML files before doing that: Delete Lines. Something like:

@EDIT (Now with NODE1 being the real sample):

    
    
    
             
         Information1  
         Information2  
         Information3 
      
      
        XXX

Turning into:

XXX

The XMLs always have 6 lines between the "xml version" node and NODE2.2. What I intend to do is delete these lines (including the ""), and the last line of the archive, that would always be .

I've tried deleting nodes, as some post here in the site, but Xpaths don't work on it if i don't delete these lines. Thats why I need to think in "lines" to delete... Otherwise, it's impossible. I really don't know what is so horrible in these lines that makes my program not finding my paths, but when i exclude them, i can do so.

I think now I have made myself a little bit more clear...

Can someone please help me?

Ekkehard.Horner · Accepted Answer

If you would start your XML related scripts with a skeleton like:

  Dim goFS   : Set goFS  = CreateObject("Scripting.FileSystemObject")
  Dim sFSpec : sFSpec    = goFS.GetAbsolutePathName("..	estdata\xml\20383899.xml")
  Dim oXDoc  : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
  oXDoc.async = False
  oXDoc.load sFSpec

  If 0 = oXDoc.ParseError Then
     WScript.Echo "ready to process"
  Else
     WScript.Echo oXDoc.parseError.reason
  End If

you'd immediately see that your .XML is not well-formed: "NODE 1.2.3" isn't a name, the NODE2.1 nodes aren't closed, and NODE2.2 can't be closed with /NODE2.1.

So your .XML should look like:

XXX

I'm confident that such well-formed .XML can be modified to your desired result, but I don't understand your specs: should the NODE1 be 'deleted'/the XML reduced to NODE2.2?

Added to eat my pudding:

A bit cheating, but if this code fragment is inserted in the skeleton:

  If 0 = oXDoc.ParseError Then
     WScript.Echo "ready to process"
     Dim sXPath : sXPath    = "/NODE1/NODE2.2"
     Dim ndFnd  : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
     If ndFnd Is Nothing Then
        WScript.Echo sXpath, "not found"
     Else
        Set oXDoc.documentElement = ndFnd
        WScript.Echo oXDoc.xml
     End If
  Else

the result:

XXX

confirms to (one interpretation of) your specs. If you can't force the XML's author to obey the standards, you should pre-process the bad XML using text/string ops (RegExp, Replace, ...) and then do the transformations in the usual way. (I admit to having no idea wrt to a RegExp that corrects arbitrary 'wrong tag used to close' blunders)

Update I:

To show the feasability of the strategy "transform the garbage to valid XML and process that", I wrote this adhoc script:

Option Explicit

Dim goFS   : Set goFS  = CreateObject("Scripting.FileSystemObject")
Dim sFSpec : sFSpec    = goFS.GetAbsolutePathName("..	estdata\xml\20383899.org.xml")
Dim sAll   : sAll      = goFS.OpenTextFile(sFSpec).ReadAll()
WScript.Echo "-------------------- garbage in"
WScript.Echo sAll

Dim reZapBlanks : Set reZapBlanks = New RegExp
reZapBlanks.Global     = True
reZapBlanks.Pattern    = "(NODE)(\s+)(\d)"
sAll = reZapBlanks.Replace(sAll, "$1$3")
Dim reAddClose : Set reAddClose = New RegExp
reAddClose.Global     = True
reAddClose.Pattern    = "()"
sAll = reAddClose.Replace(sAll, "$1/$2")
Dim reVoodoo : Set reVoodoo = New RegExp
reVoodoo.Global     = False
reVoodoo.Pattern    = "([\s\S]+)()"
sAll = reVoodoo.Replace(sAll, "$1")
WScript.Echo "-------------------- nice XML out"
WScript.Echo sAll

Dim oXDoc  : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
oXDoc.setProperty "SelectionLanguage", "XPath"
oXDoc.async = False
oXDoc.loadxml sAll ' <-- clean XML

If 0 = oXDoc.ParseError Then
   WScript.Echo "ready to process"
   Dim sXPath : sXPath    = "/NODE1/NODE2.2"
   Dim ndFnd  : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
   If ndFnd Is Nothing Then
      WScript.Echo sXpath, "not found"
   Else
      Set oXDoc.documentElement = ndFnd
      WScript.Echo "-------------------- condensed using std XML methods"
      sAll = oXDoc.xml
      WScript.Echo sAll
      oXDoc.loadxml sAll ' <-- condensed XML
      WScript.Echo "-------------------- sanity check"
      WScript.Echo "Error:", oXDoc.ParseError.errorCode
   End If
Else
   WScript.Echo oXDoc.parseError.reason
End If

output:

cscript 20383899.vbs
-------------------- garbage in



  
    
    
    
  
  
     XXX 
  


-------------------- nice XML out



  
    
    
    
  
  
     XXX 
  


ready to process
-------------------- condensed using std XML methods



         XXX 


-------------------- sanity check
Error: 0

The RegExps are tailored to this specific garbage; I don't claim that the next bad XML can be cleaned in a similar way.

Update II:

The last version of @Charlie's XML input is well-formed. So it can be processed using XML methods (XPATH to find the NODE2.2 node and assignment to .documentElement to reduce/condense the .XML file to that node). So all the above rigmarole isn't needed.

I hope that the history of this question will make everybody think twice, when the uncouth concept of "deleting lines from XML" raises its ugly head.

Deleting XML Lines in VBScript

Answers (1)

Related Questions