user3045856
user3045856

Reputation:

Deleting XML Lines in VBScript

I got the following piece of code in VBScript:

Set xmlDoc = CreateObject("Msxml2.DOMDocument.6.0")   
xmlDoc.Async = "False"
xmlDoc.setProperty "SelectionLanguage", "XPath"

For Each f In fso.GetFolder("C:\Users\Admin\Folder").Files
    If LCase(fso.GetExtensionName(f)) = "xml" Then
    xmlDoc.Load f.Path

        If xmlDoc.ParseError = 0 Then

            'Some code in here

        Else
            WScript.Echo "Parsing error! '" & f.Path & "': " & xmlDoc.ParseError.Reason

        End If
    End If
Next

I'm doing some operations to XML files inside that directory, but i need to do one thing with all these XML files before doing that: Delete Lines. Something like:

@EDIT (Now with NODE1 being the real sample):

    <?xml version="1.0" encoding="UTF-8"?>
    <!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
    <tadig-raex-21:TADIGRAEXIR21 xmlns="https://XXX" xmlns:tadig-raex-21="https://XXX" xmlns:tadig-gen="https://YYY" xmlns:xsi="ZZZ" xsi:schemaLocation="https://XXX tadig-raex-ir21-8.2.xsd">
      <NODE2.1>       
        <NODE2.1.1> Information1 </NODE2.1.1> 
        <NODE2.1.2> Information2 </NODE2.1.2> 
        <NODE2.1.3> Information3 </NODE2.1.3>
      </NODE2.1>
      <NODE2.2>
        <NODE2.2.1>XXX</NODE2.2.1>
      </NODE2.2>
   </tadig-raex-21:TADIGRAEXIR21>

Turning into:

<?xml version="1.0" encoding="UTF-8"?>
      <NODE2.2>
        <NODE 2.2.1> XXX </NODE 2.2.1>
      </NODE2.1>

The XMLs always have 6 lines between the "xml version" node and NODE2.2. What I intend to do is delete these lines (including the ""), and the last line of the archive, that would always be .

I've tried deleting nodes, as some post here in the site, but Xpaths don't work on it if i don't delete these lines. Thats why I need to think in "lines" to delete... Otherwise, it's impossible. I really don't know what is so horrible in these lines that makes my program not finding my paths, but when i exclude them, i can do so.

I think now I have made myself a little bit more clear...

Can someone please help me?

Upvotes: 0

Views: 966

Answers (1)

Ekkehard.Horner
Ekkehard.Horner

Reputation: 38745

If you would start your XML related scripts with a skeleton like:

  Dim goFS   : Set goFS  = CreateObject("Scripting.FileSystemObject")
  Dim sFSpec : sFSpec    = goFS.GetAbsolutePathName("..\testdata\xml\20383899.xml")
  Dim oXDoc  : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
  oXDoc.async = False
  oXDoc.load sFSpec

  If 0 = oXDoc.ParseError Then
     WScript.Echo "ready to process"
  Else
     WScript.Echo oXDoc.parseError.reason
  End If

you'd immediately see that your .XML is not well-formed: "NODE 1.2.3" isn't a name, the NODE2.1 nodes aren't closed, and NODE2.2 can't be closed with /NODE2.1.

So your .XML should look like:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
  <NODE2.1>
    <NODE2.1.1/>
    <NODE2.1.2/>
    <NODE2.1.3/>
  </NODE2.1>
  <NODE2.2>
    <NODE2.2.1> XXX </NODE2.2.1>
  </NODE2.2>
</NODE1>

I'm confident that such well-formed .XML can be modified to your desired result, but I don't understand your specs: should the NODE1 be 'deleted'/the XML reduced to NODE2.2?

Added to eat my pudding:

A bit cheating, but if this code fragment is inserted in the skeleton:

  If 0 = oXDoc.ParseError Then
     WScript.Echo "ready to process"
     Dim sXPath : sXPath    = "/NODE1/NODE2.2"
     Dim ndFnd  : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
     If ndFnd Is Nothing Then
        WScript.Echo sXpath, "not found"
     Else
        Set oXDoc.documentElement = ndFnd
        WScript.Echo oXDoc.xml
     End If
  Else

the result:

<?xml version="1.0"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE2.2>
        <NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>

confirms to (one interpretation of) your specs. If you can't force the XML's author to obey the standards, you should pre-process the bad XML using text/string ops (RegExp, Replace, ...) and then do the transformations in the usual way. (I admit to having no idea wrt to a RegExp that corrects arbitrary 'wrong tag used to close' blunders)

Update I:

To show the feasability of the strategy "transform the garbage to valid XML and process that", I wrote this adhoc script:

Option Explicit

Dim goFS   : Set goFS  = CreateObject("Scripting.FileSystemObject")
Dim sFSpec : sFSpec    = goFS.GetAbsolutePathName("..\testdata\xml\20383899.org.xml")
Dim sAll   : sAll      = goFS.OpenTextFile(sFSpec).ReadAll()
WScript.Echo "-------------------- garbage in"
WScript.Echo sAll

Dim reZapBlanks : Set reZapBlanks = New RegExp
reZapBlanks.Global     = True
reZapBlanks.Pattern    = "(NODE)(\s+)(\d)"
sAll = reZapBlanks.Replace(sAll, "$1$3")
Dim reAddClose : Set reAddClose = New RegExp
reAddClose.Global     = True
reAddClose.Pattern    = "(<NODE2\.1\.\d+)(>)"
sAll = reAddClose.Replace(sAll, "$1/$2")
Dim reVoodoo : Set reVoodoo = New RegExp
reVoodoo.Global     = False
reVoodoo.Pattern    = "(</NODE2\.1>[\s\S]+)(</NODE2\.1>)"
sAll = reVoodoo.Replace(sAll, "$1</NODE2.2>")
WScript.Echo "-------------------- nice XML out"
WScript.Echo sAll

Dim oXDoc  : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
oXDoc.setProperty "SelectionLanguage", "XPath"
oXDoc.async = False
oXDoc.loadxml sAll ' <-- clean XML

If 0 = oXDoc.ParseError Then
   WScript.Echo "ready to process"
   Dim sXPath : sXPath    = "/NODE1/NODE2.2"
   Dim ndFnd  : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
   If ndFnd Is Nothing Then
      WScript.Echo sXpath, "not found"
   Else
      Set oXDoc.documentElement = ndFnd
      WScript.Echo "-------------------- condensed using std XML methods"
      sAll = oXDoc.xml
      WScript.Echo sAll
      oXDoc.loadxml sAll ' <-- condensed XML
      WScript.Echo "-------------------- sanity check"
      WScript.Echo "Error:", oXDoc.ParseError.errorCode
   End If
Else
   WScript.Echo oXDoc.parseError.reason
End If

output:

cscript 20383899.vbs
-------------------- garbage in
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
  <NODE2.1>
    <NODE 2.1.1>
    <NODE 2.1.2>
    <NODE 2.1.3>
  </NODE2.1>
  <NODE2.2>
    <NODE 2.2.1> XXX </NODE 2.2.1>
  </NODE2.1>
</NODE1>

-------------------- nice XML out
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
  <NODE2.1>
    <NODE2.1.1/>
    <NODE2.1.2/>
    <NODE2.1.3/>
  </NODE2.1>
  <NODE2.2>
    <NODE2.2.1> XXX </NODE2.2.1>
  </NODE2.2>
</NODE1>

ready to process
-------------------- condensed using std XML methods
<?xml version="1.0"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE2.2>
        <NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>

-------------------- sanity check
Error: 0

The RegExps are tailored to this specific garbage; I don't claim that the next bad XML can be cleaned in a similar way.

Update II:

The last version of @Charlie's XML input is well-formed. So it can be processed using XML methods (XPATH to find the NODE2.2 node and assignment to .documentElement to reduce/condense the .XML file to that node). So all the above rigmarole isn't needed.

I hope that the history of this question will make everybody think twice, when the uncouth concept of "deleting lines from XML" raises its ugly head.

Upvotes: 3

Related Questions