Reputation: 169
I have an XML file in which I would like to retrieve all unique paths from. In the following example:
<?xml version="1.0" encoding="utf-8"?>
<views>
<invoice>
<newRa elem="0">
<createD>20150514</createD>
<modD>1234</modD>
<sample>text</sample>
</newRa>
<total>1.99</total>
</invoice>
</views>
I want to retrieve:
views/invoice/newRa/createD
views/invoice/newRa/modD
views/invoice/newRa/sample
and so on......
I have some experience with xPath, but I'm not sure how to begin in VB setting up a sub that will do this for me. Mind you I'm working with .NET 2.0 so LINQ is not possible.
EDIT 1:
Dim xOne As New XmlDocument
xOne.Load("d/input/oneTest.xml")
For Each rNode As XmlNode In xOne.SelectSingleNode("/")
If rNode.HasChildNodes Then
subHasChild(rNode)
End If
Next
Private Sub subHasChild(ByVal cNode As XmlNode)
Dim sNode = cNode.Name
If cNode.HasChildNodes Then
sNode = sNode + "/" + cNode.FirstChild.Name
cNode = cNode.FirstChild
subHasChild(cNode)
End If
Dim sw As New StreamWriter("d:\input\paths.txt")
sw.WriteLine(sNode)
sw.Flush() : sw.Close() : sw.Dispose()
End Sub
Upvotes: 1
Views: 257
Reputation: 169
Thank you to EVERYONE who chimed in with responses. After researching all sorts of ways to do this, I ended up using a dictionary to get all unique paths. For anyone who may come across a similar scenario, here is what I used:
Dim xdDoc As New SmlDocument
Dim sw As New StreamWriter("Output File Path")
Dim diElements As New Dictionary(Of String, Integer)
xdDoc.Load("File Path")
For Each rootNode As XmlNode In xdDoc.SelectNodes("//*")
Dim sNode As String = rootNode.Name
While Not rootNode.ParentNode Is Nothing _
AndAlso Not rootNode.ParentNode.Name Is "invoice" _
AndAlso Not rootNode.ParentNode.Name Is "#document"
rootNode = rootNode.ParentNode
sNode = rootNode.Name + "/" + sNode
End While
If Not diElements.ContainsKey(sNode) Then
diElements.Add(sNode, 1)
Else
diElements(sNode) += 1
End If
Next
End While
Dim pair As KeyValuePair(Of String, Integer)
For Each pair In diElements
sw.WriteLine("{0} --- {1}", pair.Value, pair.Key)
Next
sw.Flush() : sw.Close() : sw.Dispose()
Upvotes: 1
Reputation: 4742
This was a lot uglier than I thought. I'm not really a good programmer, but I can usually figure out how to get it done, but my code is typically for very limited use for small utilities, so it just needs to work.
Note: Now updated to output only unique paths
Private PathArray As New ArrayList
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim xDoc As New XmlDocument
Dim Output As String = ""
xDoc.Load("C:\inetpub\wwwroot\SqlMonitor\MonitorConfig.xml")
NodeRecurser(xDoc.SelectSingleNode("/"))
For Each item In PathArray
Output += item & vbCrLf
Next
MsgBox(Output)
Me.Close()
End Sub
Sub NodeRecurser(xNode As XmlNode)
If xNode.HasChildNodes Then
For Each cNode As XmlNode In xNode.ChildNodes
NodeRecurser(cNode)
Next
Else : GetPath(xNode)
End If
End Sub
Sub GetPath(n As XmlNode)
Dim xPath As String = ""
Do
If n.ParentNode.Name <> "#document" Then
xPath = n.ParentNode.Name & "/" & xPath
n = n.ParentNode
Else : Exit Do
End If
Loop
If xPath.Length > 1 And Not PathArray.Contains(xPath) Then PathArray.Add(xPath)
End Sub
Upvotes: 0
Reputation: 117084
Try this:
Dim xd = <?xml version="1.0" encoding="utf-8"?>
<views>
<invoice>
<newRa elem="0">
<createD>20150514</createD>
<modD>1234</modD>
<sample>text</sample>
</newRa>
<total>1.99</total>
</invoice>
</views>
Dim getPaths As Func(Of XElement, IEnumerable(Of String)) = Nothing
getPaths = Function(xe) _
If(xe.Elements().Any(), _
xe.Elements() _
.SelectMany( _
Function(x) getPaths(x), _
Function(x, p) xe.Name.ToString() + "/" + p) _
.Distinct(), _
{ xe.Name.ToString() })
Dim paths = getPaths(xd.Root)
It gives me:
views/invoice/newRa/createD
views/invoice/newRa/modD
views/invoice/newRa/sample
views/invoice/total
It correctly gets rid of duplicate paths.
Upvotes: 2