Reputation: 296
I have two types of XML
: myxml
and myxml2
that turn them into DF
. The nodes of these two XML
are different, one starts with test
and the other starts with test2
. See below:
##myxml
<?xml version="1.0" encoding="ISO-8859-1" ?>
<test:TASS xmlns="http://www.vvv.com/schemas" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vvv.com/schemas http://www.vvv.com/schemas/testV2_02_03.xsd" xmlns:test="http://www.vvv.com/schemas" >
<test:billing>
<test:proceduresummary>
<test:guidenumber>X2030</test:guidenumber>
<test:diagnosis>
<test:table>ICD-10</test:table>
<test:diagnosiscod>J441</test:diagnosiscod>
<test:description>CHRONIC OBSTRUCTIVE PULMONARY DISEASE WITH (ACUTE) EXACERBATION</test:description>
</test:diagnosis>
<test:procedure>
<test:procedure>
<test:description>HOSPITAL</test:description>
</test:procedure>
<test:amount>12</test:amount>
</test:procedure>
</test:proceduresummary>
</test:billing>
</test:TASS>
Code to turn into DF (xml=test)
#require(tidyverse)
#require(xml2)
#setwd("D:/")
#myxml<- read_xml("test.xml")
#myxml<-myxml %>% xml_find_all(".//test:billing")
#billing1<-xml2::as_list(myxml) %>% jsonlite::toJSON() %>% jsonlite::fromJSON()
Now my XML = test2
##myxml2
<?xml version="1.0" encoding="ISO-8859-1" ?>
<test2:TASS xmlns="http://www.vvv.com/schemas" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vvv.com/schemas http://www.vvv.com/schemas/testV2_02_03.xsd" xmlns:test2="http://www.vvv.com/schemas" >
<test2:billing>
<test2:proceduresummary>
<test2:guidenumber>Z4088</test2:guidenumber>
<test2:diagnosis>
<test2:table>ICD-10</test2:table>
<test2:diagnosiscod>G93</test2:diagnosiscod>
<test2:description>DISORDER OF BRAIN, UNSPECIFIED</test2:description>
</test2:diagnosis>
<test2:procedure>
<test2:procedure>
<test2:description>HOSPITAL</test2:description>
</test2:procedure>
<test2:amount>15</test2:amount>
</test2:procedure>
</test2:proceduresummary>
</test2:billing>
</test2:TASS>
Code to turn into DF (xml=test2)
#setwd("D:/")
#myxml2<- read_xml("test2.xml")
#myxml2<-myxml2 %>% xml_find_all(".//test2:billing")
#billing2<-xml2::as_list(myxml2) %>% jsonlite::toJSON() %>% jsonlite::fromJSON()
I need to create code that changes these nodes names. I thought of using a function before "xml_find_all" to swap all nodes, for example, so importing all nodes would be called "bd" instead of "test" or "test2". It is possible?
Upvotes: 1
Views: 180
Reputation: 24079
Since the second file is the same as the first file with the exception that "test:" was converted to "test2:". One option is to store all of the xPath searches as vector and then using the sub
function make the substitution.
In this file the namespaces are define a more direct approach is just to rename the namespace and reuse the code as shown below. This method works for this file, I can't say this is completely general approach.
library(xml2)
page<-read_xml('<?xml version="1.0" encoding="ISO-8859-1" ?>
<test2:TASS xmlns="http://www.vvv.com/schemas"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.vvv.com/schemas http://www.vvv.com/schemas/testV2_02_03.xsd"
xmlns:test2="http://www.vvv.com/schemas" >
<test2:billing>
<test2:proceduresummary>
<test2:guidenumber>Z4088</test2:guidenumber>
<test2:diagnosis>
<test2:table>ICD-10</test2:table>
<test2:diagnosiscod>G93</test2:diagnosiscod>
<test2:description>DISORDER OF BRAIN, UNSPECIFIED</test2:description>
</test2:diagnosis>
<test2:procedure>
<test2:procedure>
<test2:description>HOSPITAL</test2:description>
</test2:procedure>
<test2:amount>15</test2:amount>
</test2:procedure>
</test2:proceduresummary>
</test2:billing>
</test2:TASS>')
#xml_ns_strip(page)
#display the name spaces
xml_ns(page)
# d1 <-> http://www.vvv.com/schemas
# test2 <-> http://www.vvv.com/schemas
# xsi <-> http://www.w3.org/2001/XMLSchema-instance
#rename test2 to become test to reuse existing code
ns<-xml_ns_rename(xml_ns(page), test2 = "test")
t2<-page %>% xml_find_all(".//test:billing", ns) #also works
#demonstration purposes (not needed in production code)
t1<-page %>% xml_find_all(".//test2:billing") # works
t3<-page %>% xml_find_all(".//test:billing") # fails
identical(t1, t2)
#[1] TRUE
#end of demo
Edit:
As per your comments if you don't know whether the XML has test or test2 as the namespace then you can use the following test to determine whether or not you need to rename the namespace.
if ("test2" %in% names(xml_ns(page))) {
#print(TRUE)
ns<-xml_ns_rename(xml_ns(page), test2 = "test")
} else {
#print(FALSE)
ns<- xml_ns(page)
}
#this should now work for both cases.
page %>% xml_find_all(".//test:billing", ns)
Upvotes: 1