gaut
gaut

Reputation: 5958

R proper way to parse xml

I have an xml response containing Body and Header nodes, how can I access the value of the $Envelope$Body$checkVatResponse$valid node?

For some reason I already can't find the Body using xml_find_all

library(httr)
library(dplyr)
library(rvest)
library(xml2)

body = r'[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
             <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" >
             <soapenv:Header/>
             <soapenv:Body>
             <urn:checkVat  xmlns:urn="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
             <urn:countryCode>NL</urn:countryCode>
             <urn:vatNumber>800938495B01</urn:vatNumber>
             </urn:checkVat>
             </soapenv:Body>
             </soapenv:Envelope>]'

r <- POST("http://ec.europa.eu/taxation_customs/vies/services/checkVatTestService", body = body)
stop_for_status(r)
content(r) %>% xml_find_all('//Body')
content(r) %>% xml2::as_list()
res <- content(r) 

xml_children(res) %>% xml_name()
# [1] "Header" "Body"  
xml_find_all(res,'.//Body')
# {xml_nodeset (0)}

Upvotes: 0

Views: 253

Answers (1)

MrFlick
MrFlick

Reputation: 206232

When working with XML data, you need to be mindful of the namespaces used in the file. You need to previx namespaced nodes with the correct namespace. To extract the valid value you can use

content(r) %>% xml_find_all('//env:Body/ns2:checkVatResponse/ns2:valid')

To see all the namespaces used by the file you can run

content(r) %>% xml_ns()
# env <-> http://schemas.xmlsoap.org/soap/envelope/
# ns2 <-> urn:ec.europa.eu:taxud:vies:services:checkVat:types

Upvotes: 1

Related Questions