An economist
An economist

Reputation: 1311

Parsing XML to R

I want to extract exchange rates from the ECB website to convert my local currencies data. However, I am struggling a lot with using xpath (although this helped me a lot).

library(XML)

fileURL <- "https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml?93aad09b8f8b7bdb69cd1574b5b2665f"
download.file(fileURL, destfile=tf <- tempfile(fileext=".xml"))

xml_file <- xmlParse(tf)
xml_data <- xmlRoot(xml_file)

currency <- xml_data[["number(//Cube/@currency)"]]
rate <- xml_data[["number(//Cube/@rate)"]]

Then I just want to create simple data frame:

df <- data.frame(currency, rate)

Upvotes: 4

Views: 912

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269556

1) xpathSApply The following line gives a character matrix m with currency and rate columns:

m <- t(xpathSApply(xml_data, "//*[@rate]", xmlAttrs))

If needed in the form of a data frame with character and numeric columns add this:

read.table(text = paste(m[, 1], m[, 2]), as.is = TRUE)

Note: We avoided having to deal with namespaces by using * in the XPath expression but if it were desired to explicitly refer to Cube, as in the question, then it would be done like this:

m <- xpathSApply(xml_data, "//x:Cube[@rate]", xmlAttrs, namespaces = "x")

2) read.pattern An alternative way is to parse the XML file using read.pattern in gsubfn. (This does not use the XML package.)

library(gsubfn)
read.pattern(tf, pattern = "'(...)' rate='([0-9.]+)'", col.names = c("currency", "rate"))

Upvotes: 4

hrbrmstr
hrbrmstr

Reputation: 78792

Alternate version of G's answer in the Hadleyverse:

library(xml2)
library(purrr)
library(readr)

fileURL <- "https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml?93aad09b8f8b7bdb69cd1574b5b2665f"
doc <- read_xml(fileURL)
dat <- type_convert(map_df(xml_attrs(xml_find_all(doc, "//*[@rate]")), as.list))

dat

## Source: local data frame [31 x 2]
## 
##    currency     rate
##       (chr)    (dbl)
## 1       USD   1.0933
## 2       JPY 130.5800
## 3       BGN   1.9558
## 4       CZK  27.0250
## 5       DKK   7.4623
## 6       GBP   0.7533
## 7       HUF 310.5500
## 8       PLN   4.3982
## 9       RON   4.5093
## 10      SEK   9.3580
## ..      ...      ...

You get typed columns in a data frame with this approach, too.

Upvotes: 4

Related Questions