Etienne Low-Décarie
Etienne Low-Décarie

Reputation: 13443

How to extract xml data from a CrossRef using R?

If you put in your CrossRef email the following URL produces an XML file

"http://www.crossref.org/openurl?title=Science&aulast=Fernández&date=2009&multihit=true&pid=your.crossref.email"

An example file is available here:

crossref.xml

I wish to extract the list of DOI (Digital Object Identifies) into an data.frame in R. I wish to do so using one of the general R xml packages

library(XML) or library(tm)

I have tried

doc<-xmlTreeParse(file)
top<-xmlRoot(doc)

but can not figure out how to go from here

top[[1]]["doi"]

does not work.

Upvotes: 3

Views: 671

Answers (3)

Wyatt
Wyatt

Reputation: 51

I had the exact same lack of understanding. I spent a day and half looking and finaly came across this post.

Thanks!!!

Upvotes: 0

sckott
sckott

Reputation: 5893

I and others as part of rOpenSci have some functions for hitting the Crossref API, functions crossref and crossref_r here.

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269694

Try this:

library(XML)
doc <- xmlTreeParse("crossref.xml", useInternalNodes = TRUE)
root <- xmlRoot(doc)
xpathSApply(root, "//x:doi", xmlValue, namespaces = "x")

Upvotes: 2

Related Questions