Gmac
Gmac

Reputation: 169

Returning distinct paths in XML

I have an XML file in which I would like to retrieve all unique paths from. In the following example:

<?xml version="1.0" encoding="utf-8"?>
<views>
    <invoice>
        <newRa elem="0">
            <createD>20150514</createD>
            <modD>1234</modD>
            <sample>text</sample>
        </newRa>
        <total>1.99</total>
    </invoice>
</views>

I want to retrieve:

views/invoice/newRa/createD
views/invoice/newRa/modD
views/invoice/newRa/sample

and so on......

I have some experience with xPath, but I'm not sure how to begin in VB setting up a sub that will do this for me. Mind you I'm working with .NET 2.0 so LINQ is not possible.

EDIT 1:

Dim xOne As New XmlDocument
xOne.Load("d/input/oneTest.xml")

For Each rNode As XmlNode In xOne.SelectSingleNode("/")
    If rNode.HasChildNodes Then
        subHasChild(rNode)
    End If
Next



Private Sub subHasChild(ByVal cNode As XmlNode)
    Dim sNode = cNode.Name

    If cNode.HasChildNodes Then
        sNode = sNode + "/" + cNode.FirstChild.Name
        cNode = cNode.FirstChild
        subHasChild(cNode)
    End If

    Dim sw As New StreamWriter("d:\input\paths.txt")
    sw.WriteLine(sNode)
    sw.Flush() : sw.Close() : sw.Dispose()
End Sub

Upvotes: 1

Views: 257

Answers (3)

Gmac
Gmac

Reputation: 169

Thank you to EVERYONE who chimed in with responses. After researching all sorts of ways to do this, I ended up using a dictionary to get all unique paths. For anyone who may come across a similar scenario, here is what I used:

Dim xdDoc As New SmlDocument
Dim sw As New StreamWriter("Output File Path")
Dim diElements As New Dictionary(Of String, Integer)

xdDoc.Load("File Path")

For Each rootNode As XmlNode In xdDoc.SelectNodes("//*")
            Dim sNode As String = rootNode.Name

            While Not rootNode.ParentNode Is Nothing _
            AndAlso Not rootNode.ParentNode.Name Is "invoice" _
            AndAlso Not rootNode.ParentNode.Name Is "#document"
                rootNode = rootNode.ParentNode
                sNode = rootNode.Name + "/" + sNode
            End While

            If Not diElements.ContainsKey(sNode) Then
                diElements.Add(sNode, 1)
            Else
                diElements(sNode) += 1
            End If
        Next
    End While

    Dim pair As KeyValuePair(Of String, Integer)
    For Each pair In diElements
        sw.WriteLine("{0} --- {1}", pair.Value, pair.Key)
    Next

    sw.Flush() : sw.Close() : sw.Dispose()

Upvotes: 1

Tony Hinkle
Tony Hinkle

Reputation: 4742

This was a lot uglier than I thought. I'm not really a good programmer, but I can usually figure out how to get it done, but my code is typically for very limited use for small utilities, so it just needs to work.

Note: Now updated to output only unique paths

Private PathArray As New ArrayList

Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load

    Dim xDoc As New XmlDocument
    Dim Output As String = ""

    xDoc.Load("C:\inetpub\wwwroot\SqlMonitor\MonitorConfig.xml")
    NodeRecurser(xDoc.SelectSingleNode("/"))

    For Each item In PathArray
        Output += item & vbCrLf
    Next

    MsgBox(Output)

    Me.Close()

End Sub

Sub NodeRecurser(xNode As XmlNode)

    If xNode.HasChildNodes Then

        For Each cNode As XmlNode In xNode.ChildNodes

            NodeRecurser(cNode)

        Next

    Else : GetPath(xNode)

    End If

End Sub

Sub GetPath(n As XmlNode)

    Dim xPath As String = ""

    Do

        If n.ParentNode.Name <> "#document" Then

            xPath = n.ParentNode.Name & "/" & xPath
            n = n.ParentNode

        Else : Exit Do

        End If

    Loop

    If xPath.Length > 1 And Not PathArray.Contains(xPath) Then PathArray.Add(xPath)

End Sub

Upvotes: 0

Enigmativity
Enigmativity

Reputation: 117084

Try this:

    Dim xd = <?xml version="1.0" encoding="utf-8"?>
<views>
    <invoice>
        <newRa elem="0">
            <createD>20150514</createD>
            <modD>1234</modD>
            <sample>text</sample>
        </newRa>
        <total>1.99</total>
    </invoice>
</views>

    Dim getPaths As Func(Of XElement, IEnumerable(Of String)) = Nothing
    getPaths = Function(xe) _
        If(xe.Elements().Any(), _
            xe.Elements() _
                .SelectMany( _
                    Function(x) getPaths(x), _
                    Function(x, p) xe.Name.ToString() + "/" + p) _
                .Distinct(), _
            { xe.Name.ToString() })

    Dim paths = getPaths(xd.Root)

It gives me:

views/invoice/newRa/createD 
views/invoice/newRa/modD 
views/invoice/newRa/sample 
views/invoice/total 

It correctly gets rid of duplicate paths.

Upvotes: 2

Related Questions