foreachin
foreachin

Reputation: 91

Parsing complicated xml with VB.Net: elements into strings depending on namespace

I have the following xml in a file (simplified):

<?xml version="1.0" encoding="ISO-8859-1"?>
<XCer xmlns="http://x.y.z" xmlns:xsi="http://www.x.y.z" xsi:schemaLocation="http://www.x.y.z" track_id="559" mp_id="398" sub_id="569">
<capability xsi:type="XCracT">
<type>rec</type>
<sub_type>pc</sub_type>
<action>reco</action>
</capability>

<final_result OD="DGS=1.6" creator="Creator1" version="1.11" xsi:type="XCarT">
<code>300000000</code>
<code_cnf>0.7454</code_cnf>
<code_attr>seq</code_attr>
<status_attr>fdos</status_attr>
<text>this text</text>
<standardized_text>other text</standardized_text>
<region>
  <type>add</type>
  <symbology>machine</symbology>
</region>
</final_result>

<final_result OD="DGS=1.7" creator="Creator2" version="1.11" xsi:type="XCarT">
<code>3040280100015</code>
<code_cnf>0.7454</code_cnf>
<code_attr>seq</code_attr>
<status_attr>fdos</status_attr>
<text>this text</text>
<standardized_text>other text</standardized_text>
<region>
  <type>add</type>
  <symbology>machine</symbology>
</region>
    <polygon>
    <dot x="849" y="1600"/>
    <dot x="823" y="1600"/>
    <dot x="819" y="1166"/>
    <dot x="845" y="1166"/>
    </polygon>
</final_result>
</XCer>

On a very basic level I want to create 3 variables: creator, code, mp_id and fill them with the details from the final result OD="DGS=1.6" section i.e. 'Creator1', '300000000', '398' (from the first XCer element) and 'this text' but my xml skills are soreley lacking despite trying several crash courses in the last couple of days.

I have tried the basic

Using reader As XmlReader = XmlReader.Create("C:\filename.xml")  
while reader.Read()  
if reader.IsStartElement() Then  
If reader.Name = "code" Then  
code = reader.ReadElementContentAsString()  

which gets me the code but I cannot get any of the elements within the line
final_result OD="DGS=1.6" creator="Creator1" version="1.11" xsi:type="XCarT">

and I cannot limit the code element which I want to that subtree without over writing the previous code.

Upvotes: 2

Views: 1696

Answers (2)

kiran patel
kiran patel

Reputation: 46

If in case you don't know underlying XML structure and try to find the node based on some tag name. Note: xml may have namespace defined against the searched tag. In below example we are searching for XML tag CashInAccountNumber but in actual xml it has namespace declaration like "bctr:CashInAccountNumber". Following is xml structure:

<xfa:data xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/"><bctr:BSAForm xmlns:bctr="http://www.fincen.gov/bsa/bctr/2011-06-01" xmlns:cc="http://www.fincen.gov/bsa/common-components/2009-01-01" xmlns:est="http://www.fincen.gov/bsa/efile-submission-types/2009-01-01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><bctr:PersonInformation><bctr:CashInAccountNumber><bctr:CashInAccountNumber /></bctr:CashInAccountNumber><bctr:CashOutAccountNumber><bctr:CashOutAccountNumber /></bctr:CashOutAccountNumber></bctr:PersonInformation><xfdf:field xmlns:xfdf="http://ns.adobe.com/xfdf/" xmlns:xfdfi="http://ns.adobe.com/xfdf-transition/" xfdfi:original="FSAPPLICATIONDATA_">n6835LaAwLhBAAA</xfdf:field><xfdf:field xmlns:xfdf="http://ns.adobe.com/xfdf/" xmlns:xfdfi="http://ns.adobe.com/xfdf-transition/" xfdfi:original="FSTARGETURL_">https://sdtmut1.fincen.treas.gov/AltSubmitServlet</xfdf:field></bctr:BSAForm><FSTEMPLATE_ /><FSFORMQUERY_>BCTR.pdf</FSFORMQUERY_><FSTRANSFORMATIONID_>PDFForm</FSTRANSFORMATIONID_><FSTARGETURL_>https://sdtmut1.fincen.treas.gov/AltSubmitServlet</FSTARGETURL_><FSAWR_>https://sdtmut1.fincen.treas.gov/</FSAWR_><FSWR_>https://sdtmut1.fincen.treas.gov/FormServer</FSWR_><FSCRURI_>/opt/weblogic/user_projects/domains/BSADomain/applications/FinCEN1/forms</FSCRURI_><FSBASEURL_>https://sdtmut1.fincen.treas.gov/</FSBASEURL_></xfa:data>

In such cases try to get value using decedents("CashInAccountNumber") method if it still not found then search using below code snippet:

        Try
        Dim fs As New FileStream("Your XML File Path", FileMode.Open, FileAccess.Read)

        Dim objDocument As New XPathDocument(fs)
        Dim objNavigator As XPathNavigator = objDocument.CreateNavigator()
        Dim objNodeIterator As XPathNodeIterator = objNavigator.[Select]("//namespace::*[not(. = ../../namespace::*)]")
        While objNodeIterator.MoveNext()
            Dim sKey As String = objNodeIterator.Current.LocalName
            If Not dictNameSpace.ContainsKey(sKey) Then
                dictNameSpace.Add(objNodeIterator.Current.LocalName, objNodeIterator.Current.Value)
            End If
        End While

        fs = New FileStream("Your XML File Path", FileMode.Open, FileAccess.Read)
        Dim xmlr As XDocument = XDocument.Load(fs)

        For Each kvp As KeyValuePair(Of String, String) In dictNameSpace
            Dim ns As XNamespace = kvp.Value
            For Each xEle As XElement In xmlr.Descendants(ns + "CashInAccountNumber")
                Console.WriteLine("Found Tag:" + xEle.Name.LocalName)
            Next xEle
        Next

    Catch ex As Exception
        Console.WriteLine(ex.Message)
    End Try

Upvotes: 0

Steven Doggart
Steven Doggart

Reputation: 43743

Here's a simple example of how to do it using XPath:

Dim doc As New XmlDocument()
doc.Load("Test.xml")
Dim namespaceManager As New XmlNamespaceManager(doc.NameTable)
namespaceManager.AddNamespace("x", "http://x.y.z")
Dim mp_id As String = doc.SelectSingleNode("/x:XCer[1]/@mp_id", namespaceManager).InnerText
Dim creator As String = doc.SelectSingleNode("/x:XCer[1]/x:final_result[@OD='DGS=1.6']/@creator", namespaceManager).InnerText
Dim code As String = doc.SelectSingleNode("/x:XCer[1]/x:final_result[@OD='DGS=1.6']/x:code", namespaceManager).InnerText

Notice that I needed to specify the namespace, since all the elements have a default namespace of http://x.y.z. For my purposes, I gave the namespace a prefix of x, but you could name it anything you want.

XPath is a standard language used for querying XML documents. Some people prefer Microsoft's proprietary LINQ technology, but since XPath is what is used by other languages, tools, and technologies, it is well worth taking the time to learn it. The SelectSingleNode and SelectNodes methods allow you to find matching nodes using XPath.

The first XPath, which selects the mp_id, looks like this:

/x:XCer[1]/@mp_id

  • / - Start at the root of the document
  • x:XCer - Find an element named XCer (in the x namespace)
  • [1] - Select only the first XCer element
  • / - Look for a descendant node of the first XCer element
  • @mp_id - Select the mp_id attribute (which is a descendant node of the first XCer element)

The next XPath, which selects the creator attribute, looks like this:

/x:XCer[1]/x:final_result[@OD='DGS=1.6']/@creator

This one starts the same as the last one, but instead of selecting an attribute of the XCer element, it selects a final_result child-element. The [@OD='DGS=1.6'] is a conditional clause. You can read it like "select a final_result element where its OD attribute equals DGS=1.6".

Upvotes: 2

Related Questions