Reputation:
I got the following piece of code in VBScript:
Set xmlDoc = CreateObject("Msxml2.DOMDocument.6.0")
xmlDoc.Async = "False"
xmlDoc.setProperty "SelectionLanguage", "XPath"
For Each f In fso.GetFolder("C:\Users\Admin\Folder").Files
If LCase(fso.GetExtensionName(f)) = "xml" Then
xmlDoc.Load f.Path
If xmlDoc.ParseError = 0 Then
'Some code in here
Else
WScript.Echo "Parsing error! '" & f.Path & "': " & xmlDoc.ParseError.Reason
End If
End If
Next
I'm doing some operations to XML files inside that directory, but i need to do one thing with all these XML files before doing that: Delete Lines. Something like:
@EDIT (Now with NODE1 being the real sample):
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<tadig-raex-21:TADIGRAEXIR21 xmlns="https://XXX" xmlns:tadig-raex-21="https://XXX" xmlns:tadig-gen="https://YYY" xmlns:xsi="ZZZ" xsi:schemaLocation="https://XXX tadig-raex-ir21-8.2.xsd">
<NODE2.1>
<NODE2.1.1> Information1 </NODE2.1.1>
<NODE2.1.2> Information2 </NODE2.1.2>
<NODE2.1.3> Information3 </NODE2.1.3>
</NODE2.1>
<NODE2.2>
<NODE2.2.1>XXX</NODE2.2.1>
</NODE2.2>
</tadig-raex-21:TADIGRAEXIR21>
Turning into:
<?xml version="1.0" encoding="UTF-8"?>
<NODE2.2>
<NODE 2.2.1> XXX </NODE 2.2.1>
</NODE2.1>
The XMLs always have 6 lines between the "xml version" node and NODE2.2. What I intend to do is delete these lines (including the ""), and the last line of the archive, that would always be .
I've tried deleting nodes, as some post here in the site, but Xpaths don't work on it if i don't delete these lines. Thats why I need to think in "lines" to delete... Otherwise, it's impossible. I really don't know what is so horrible in these lines that makes my program not finding my paths, but when i exclude them, i can do so.
I think now I have made myself a little bit more clear...
Can someone please help me?
Upvotes: 0
Views: 966
Reputation: 38745
If you would start your XML related scripts with a skeleton like:
Dim goFS : Set goFS = CreateObject("Scripting.FileSystemObject")
Dim sFSpec : sFSpec = goFS.GetAbsolutePathName("..\testdata\xml\20383899.xml")
Dim oXDoc : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
oXDoc.async = False
oXDoc.load sFSpec
If 0 = oXDoc.ParseError Then
WScript.Echo "ready to process"
Else
WScript.Echo oXDoc.parseError.reason
End If
you'd immediately see that your .XML is not well-formed: "NODE 1.2.3" isn't a name, the NODE2.1 nodes aren't closed, and NODE2.2 can't be closed with /NODE2.1.
So your .XML should look like:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
<NODE2.1>
<NODE2.1.1/>
<NODE2.1.2/>
<NODE2.1.3/>
</NODE2.1>
<NODE2.2>
<NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>
</NODE1>
I'm confident that such well-formed .XML can be modified to your desired result, but I don't understand your specs: should the NODE1 be 'deleted'/the XML reduced to NODE2.2?
Added to eat my pudding:
A bit cheating, but if this code fragment is inserted in the skeleton:
If 0 = oXDoc.ParseError Then
WScript.Echo "ready to process"
Dim sXPath : sXPath = "/NODE1/NODE2.2"
Dim ndFnd : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
If ndFnd Is Nothing Then
WScript.Echo sXpath, "not found"
Else
Set oXDoc.documentElement = ndFnd
WScript.Echo oXDoc.xml
End If
Else
the result:
<?xml version="1.0"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE2.2>
<NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>
confirms to (one interpretation of) your specs. If you can't force the XML's author to obey the standards, you should pre-process the bad XML using text/string ops (RegExp, Replace, ...) and then do the transformations in the usual way. (I admit to having no idea wrt to a RegExp that corrects arbitrary 'wrong tag used to close' blunders)
Update I:
To show the feasability of the strategy "transform the garbage to valid XML and process that", I wrote this adhoc script:
Option Explicit
Dim goFS : Set goFS = CreateObject("Scripting.FileSystemObject")
Dim sFSpec : sFSpec = goFS.GetAbsolutePathName("..\testdata\xml\20383899.org.xml")
Dim sAll : sAll = goFS.OpenTextFile(sFSpec).ReadAll()
WScript.Echo "-------------------- garbage in"
WScript.Echo sAll
Dim reZapBlanks : Set reZapBlanks = New RegExp
reZapBlanks.Global = True
reZapBlanks.Pattern = "(NODE)(\s+)(\d)"
sAll = reZapBlanks.Replace(sAll, "$1$3")
Dim reAddClose : Set reAddClose = New RegExp
reAddClose.Global = True
reAddClose.Pattern = "(<NODE2\.1\.\d+)(>)"
sAll = reAddClose.Replace(sAll, "$1/$2")
Dim reVoodoo : Set reVoodoo = New RegExp
reVoodoo.Global = False
reVoodoo.Pattern = "(</NODE2\.1>[\s\S]+)(</NODE2\.1>)"
sAll = reVoodoo.Replace(sAll, "$1</NODE2.2>")
WScript.Echo "-------------------- nice XML out"
WScript.Echo sAll
Dim oXDoc : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
oXDoc.setProperty "SelectionLanguage", "XPath"
oXDoc.async = False
oXDoc.loadxml sAll ' <-- clean XML
If 0 = oXDoc.ParseError Then
WScript.Echo "ready to process"
Dim sXPath : sXPath = "/NODE1/NODE2.2"
Dim ndFnd : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
If ndFnd Is Nothing Then
WScript.Echo sXpath, "not found"
Else
Set oXDoc.documentElement = ndFnd
WScript.Echo "-------------------- condensed using std XML methods"
sAll = oXDoc.xml
WScript.Echo sAll
oXDoc.loadxml sAll ' <-- condensed XML
WScript.Echo "-------------------- sanity check"
WScript.Echo "Error:", oXDoc.ParseError.errorCode
End If
Else
WScript.Echo oXDoc.parseError.reason
End If
output:
cscript 20383899.vbs
-------------------- garbage in
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
<NODE2.1>
<NODE 2.1.1>
<NODE 2.1.2>
<NODE 2.1.3>
</NODE2.1>
<NODE2.2>
<NODE 2.2.1> XXX </NODE 2.2.1>
</NODE2.1>
</NODE1>
-------------------- nice XML out
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
<NODE2.1>
<NODE2.1.1/>
<NODE2.1.2/>
<NODE2.1.3/>
</NODE2.1>
<NODE2.2>
<NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>
</NODE1>
ready to process
-------------------- condensed using std XML methods
<?xml version="1.0"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE2.2>
<NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>
-------------------- sanity check
Error: 0
The RegExps are tailored to this specific garbage; I don't claim that the next bad XML can be cleaned in a similar way.
Update II:
The last version of @Charlie's XML input is well-formed. So it can be processed using XML methods (XPATH to find the NODE2.2 node and assignment to .documentElement to reduce/condense the .XML file to that node). So all the above rigmarole isn't needed.
I hope that the history of this question will make everybody think twice, when the uncouth concept of "deleting lines from XML" raises its ugly head.
Upvotes: 3