Sarah Hailey
Sarah Hailey

Reputation: 494

XML R How to retrieve values (could this be a namespace issue?)

Just when I thought I understood XPath! I must be missing something really simple, but I can't select the value of the node "citedby-count" in the following:

xml <- "<?xml version='1.0' encoding='UTF-8'?>
        <search-results xmlns='http://www.w3.org/2005/Atom' xmlns:cto='http://www.elsevier.com/xml/cto/dtd' xmlns:atom='http://www.w3.org/2005/Atom' xmlns:prism='http://prismstandard.org/namespaces/basic/2.0/' xmlns:opensearch='http://a9.com/-/spec/opensearch/1.1/' xmlns:dc='http://purl.org/dc/elements/1.1/'>

            <entry>
                 <prism:url>http://api.elsevier.com/content/abstract/scopus_id/111111</prism:url>
                 <dc:title>Paper Title</dc:title>
                 <citedby-count>1</citedby-count>
            </entry> 
        </search-results>"

doc <- xmlParse(xml)

I've tried

doc["//citedby-count"]

and

doc["//{'citedby-count'}"]

and

doc["//entry"]

but all return

list()
attr(,"class")
[1] "XMLNodeSet"

however,

doc["//dc:title"] 

works just fine.

Have I just been looking at this too long? Please help!

**Edit:**I thought this was because of the hyphen but it can't be because

doc["//entry"] 

doesn't work either.

Upvotes: 0

Views: 402

Answers (2)

har07
har07

Reputation: 89285

Common namespace prefix is declared as xmlns:foo="...", where foo is the prefix, and it is used in element name explicitly as <foo:bar> where bar is the element's local-name. Apart from that there is default namespace. It is namespace declared without prefix like xmlns="...", and the usage is implied on the element where default prefix is declared as well as the descendant elements, unless something is overriding the default namespace inheritance i.e having local default namespace or using explicit prefix in the descendant element's name.

That's the first part the story, which is about namespace in XML. On the other hand, XPath has no idea about default namespace. In XPath, element without prefix is always considered in empty namespace. To bridge the difference between XML and XPath regarding default namespace, usually when you need to query element in default namespace, you have to define a prefix pointing to the XML's default namespace and use that prefix in the XPath expression. That's basically what @hrbrmstr suggested in the first comment, something like the following (the prefix can be anything as long as it is mapped to the correct default namespace) :

doc["//d:citedby-count", namespaces=c(d="http://www.w3.org/2005/Atom")]

but turns out that your XML has an explicit prefix, atom, which already points to the same namespace uri and can be used directly.

Upvotes: 1

Nico21
Nico21

Reputation: 97

You can also do doc["//x:citedby-count", namespace = "x"] to deal with default namespaces (it is from the examples of xpathApply).

Upvotes: 0

Related Questions