Thomaz Dias
Thomaz Dias

Reputation: 33

How to parse XML in VBA and retrieve specific values

I've already spent two weeks searching unsuccessfully how to parse one specific XML and fetch just few values. I already tried every single code on internet until I found one that solved part of my problem.

The XML i'm trying to fetch it's from U.S Department of Agriculture, and is free to access.

https://apps.fas.usda.gov/psdonline/app/index.html#/app/about

    Dim xmlDoc As MSXML2.DOMDocument60
    Dim xmlNode As MSXML2.IXMLDOMNode
    Dim xmlNodeList As MSXML2.IXMLDOMNodeList
    Dim myNode As MSXML2.IXMLDOMNode

    Dim URL As String, APIkey As String

    APIkey = "8DB688F8-1E22-4031-B581-59C221ECDDA6"

    URL = "https://apps.fas.usda.gov/PSDOnlineDataServices/api/CommodityData/GetCommodityDataByYear?commodityCode=2222000&marketYear=2018"

    Set xmlDoc = New MSXML2.DOMDocument60
    xmlDoc.async = False

    With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", URL, False
    .SetRequestHeader "Accept", "text/xml"
    .SetRequestHeader "API_KEY", APIkey
    .Send
    xmlDoc.loadXML .ResponseText
End With

Set xmlNodeList = xmlDoc.getElementsByTagName("*")
    For Each xmlNode In xmlNodeList
        For Each myNode In xmlNode.childNodes
          If myNode.nodeType = NODE_TEXT Then
            Debug.Print xmlNode.nodeName & "=" & xmlNode.text
          End If
        Next myNode
    Next xmlNode
    Set xmlDoc = Nothing
End Sub

The response of this code show the entire XML listed, but when I try to find one specific node, the code result it's nothing.

in

Set xmlNodeList = xmlDoc.getElementsByTagName("*")

I've tried to use the address "//AttributeDescription", but apparently just work using the "*".

I need to receive, for example, The response below:

AttributeDescription=Production

CountryName=Brazil

Value=0.00000

I did my best trying to get the right response and I also consider that the XML structure it's not in the right format due the lack of response when addressing...

Is there anything that I can do to solve this issue?

Upvotes: 2

Views: 1683

Answers (3)

Thomaz Dias
Thomaz Dias

Reputation: 33

I've reached this solution, mixing the two answers, and sharing the code to help others.

First I set the property and then used the iteration to retrieve the values I needed, I don't know if this is the best solution, since I can't control the XML structure and if they change their file I'll need to return to this code.

I tried to work in a "Safety Line" to avoid any mistake in the output, but no problem for me to double check since I have access to the data itself.

If node.childNodes(0).text = "Production" And node.childNodes(6).text = "Argentina" Then

To ensure that the name and response will bring whatever I want.

Public Sub test_3()
    Dim xmlDoc As MSXML2.DOMDocument60
    Dim URL As String, APIkey As String

    APIkey = "8DB688F8-1E22-4031-B581-59C221ECDDA6"

    URL = "https://apps.fas.usda.gov/PSDOnlineDataServices/api/CommodityData/GetCommodityDataByYear?commodityCode=2222000&marketYear=2018"

    Set xmlDoc = New MSXML2.DOMDocument60

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .SetRequestHeader "Accept", "text/xml"
        .SetRequestHeader "API_KEY", APIkey
        .Send
        xmlDoc.loadXML .ResponseText
        xmlDoc.SetProperty "SelectionNamespaces", "xmlns:r='http://schemas.datacontract.org/2004/07/PSDOnline.DataServices.Models'"
    End With

    Dim node As IXMLDOMElement, r As Long

    For Each node In xmlDoc.selectNodes("//r:CommodityData")
        If node.childNodes(0).text = "Production" And node.childNodes(6).text = "Argentina" Then
        r = r + 1
        Debug.Print node.childNodes(0).text
        Debug.Print node.childNodes(6).text
        Debug.Print node.LastChild.text
        'With ActiveSheet
            '.Cells(r, 1) = node.childNodes(0).text
            '.Cells(r, 2) = node.childNodes(6).text
            '.Cells(r, 3) = node.LastChild.text
        'End With
        End If
    Next
End Sub

This solution return the following response in the DEBUGGER:

Production

Argentina

55300.0000

Exactly what I wanted.

Thanks again for the time and for sharing knowledge.

Upvotes: 1

barrowc
barrowc

Reputation: 10679

There are two separate issues here.

MSXML2 has issues using XPath when the XML document has a default namespace - see here for details. At the start of the downloaded document from the USDA site, there are some namespace declarations:

<ArrayOfCommodityData xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/PSDOnline.DataServices.Models">

There are two namespaces declared here. One with the prefix i and a default namespace that covers any element which does not have a namespace prefix. If you look at a "CalendarYear" entry in the XML document - <CalendarYear i:nil="true" /> - then you can see that "CalendarYear" is in the default namespace whereas "nil" is in the "i" namespace.

To make MSXML2 work with default namespaces, you have to declare a namespace which has the same URI as the default namespace. This is done using the SelectionNamespaces property of the XML document, like this:

xmlDoc.SetProperty "SelectionNamespaces", "xmlns:r='http://schemas.datacontract.org/2004/07/PSDOnline.DataServices.Models'"

I chose r as the namespace but the name you choose is irrelevant - it just has to be different from any other namespaces in the document.

This leads on to the second problem. You are using getElementsByTagName which just takes a tag name as a parameter but you are passing in an XPath string. To deal with an XPath string, you need to use SelectNodes instead and you need to use the namespace we added, like this:

Set xmlNodeList = xmlDoc.SelectNodes("//r:AttributeDescription")

Upvotes: 2

QHarr
QHarr

Reputation: 84465

It's a namespace issue I think. There are people more familiar with this who can likely fix how to add properly and then reference. I did try adding the two namespaces with the usual syntax .setProperty "SelectionNamespaces", namespace but still failed to set objects so guess I did something wrong.

An interim, less robust solution is as follows:

Option Explicit
Public Sub test()
    Dim xmlDoc As MSXML2.DOMDocument60
    Dim URL As String, APIkey As String

    APIkey = "key"

    URL = "https://apps.fas.usda.gov/PSDOnlineDataServices/api/CommodityData/GetCommodityDataByYear?commodityCode=2222000&marketYear=2018"

    Set xmlDoc = New MSXML2.DOMDocument60

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .SetRequestHeader "Accept", "text/xml"
        .SetRequestHeader "API_KEY", APIkey
        .Send
        xmlDoc.LoadXML .responseText
    End With

    Dim node As IXMLDOMElement, r As Long
    For Each node In xmlDoc.SelectNodes("/*[name()='ArrayOfCommodityData']/*[name()='CommodityData']")
        r = r + 1
        With ActiveSheet
            .Cells(r, 1) = node.ChildNodes(0).Text
            .Cells(r, 2) = node.ChildNodes(6).Text
            .Cells(r, 3) = node.ChildNodes(11).Text
        End With
    Next
End Sub

Upvotes: 3

Related Questions