Reputation: 9228
Given an scala XML object, can I perform a xpath string query like "//entries[@title='scala']" ?
Ideally, it would be like:
<a><b name='n1'></b></a>.xpath("//b[@name='n1']")
I can't manually convert all the xpath queries to scala's internal xpath-ish method calls as my program will is dynamically accepting xpath queries.
Additionally, the built-in java xml library is very verbose so I would like to avoid it.
Upvotes: 3
Views: 2282
Reputation: 6178
kantan.xpath does just that. Here's something I just typed in the REPL:
import kantan.xpath._
import kantan.xpath.ops._
"<a><b name='n1'></b></a>".evalXPath[Node]("//b[@name='n1']")
, where the Node
type parameter describes the type one expects to extract from the XML document. A perhaps more explicit example would be:
new URI("http://stackoverflow.com").evalXPath[List[URI]]("//a/@href")
This would download the stackoverflow homepage, evaluate it as an XML Document (there's a NekoHTML module for HTML sanitisation) and extract the target of all links.
Upvotes: 1
Reputation: 58800
Your best bet is (and always was, even in Java) to use JDOM. I've pimped JDom with the following library to be a bit more scala friendly:
import org.jdom._
import org.jdom.xpath._
import scala.collection.JavaConversions
import java.util._
import scala.collection.Traversable
package pimp.org.jdom{
object XMLNamespace{
def apply(prefix:String,uri:String) = Namespace.getNamespace(prefix,uri)
def unapply(x:Namespace) = Some( (x.getPrefix, x.getURI) )
}
object XMLElement{
implicit def wrap(e:Element) = new XMLElement(e)
def unapply(x:Element) = Some( (x.getName, x.getNamespace) )
}
class XMLElement(underlying:Element){
def attributes:java.util.List[Attribute] =
underlying.getAttributes.asInstanceOf[java.util.List[Attribute]]
def children:java.util.List[Element] =
underlying.getChildren.asInstanceOf[java.util.List[Element]]
def children(name: String): java.util.List[Element] =
underlying.getChildren(name).asInstanceOf[java.util.List[Element]]
def children(name: String, ns: Namespace): java.util.List[Element] =
underlying.getChildren(name, ns).asInstanceOf[java.util.List[Element]]
}
}
package pimp.org.jdom.xpath{
import pimp.org.jdom._
//instances of these classes are not thread safe when xpath variables are used
class SingleNodeQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
private val compiled=XPath.newInstance(expression)
if (namespaces!=null){
for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
}
def apply(startFrom:Any,variables:(String,String)*)={
variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
compiled.selectSingleNode(startFrom).asInstanceOf[NType]
}
}
class NodesQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
private val compiled=XPath.newInstance(expression)
if (namespaces!=null){
for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
}
def apply(startFrom:Any,variables:(String,String)*)={
variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
compiled.selectNodes(startFrom).asInstanceOf[java.util.List[NType]]
}
}
class NumberValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
private val compiled=XPath.newInstance(expression)
if (namespaces!=null){
for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
}
def apply(startFrom:Any,variables:(String,String)*)={
variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
compiled.numberValueOf(startFrom).intValue
}
}
class ValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
private val compiled=XPath.newInstance(expression)
if (namespaces!=null){
for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
}
def apply(startFrom:Any,variables:(String,String)*)={
variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
compiled.valueOf(startFrom)
}
}
}
My idea when I wrote this was that in general, you want to compile each XPath query in advance (so that it can be reused more than once), and that you want to specify the type returned by the query at the point where you specify the text of the query (not like JDOM's XPath class does which is to pick one of four methods to call at execution time).
Namespaces should be passed around implicitly (so you can specify them once and then forget about them), and XPath variable binding should be available at query time.
You'd use the library like this: (Explicit type annotations can be inferred -- I've included them for illustration only.)
val S = XMLNamespace("s","http://www.nist.gov/speech/atlas")
val XLink = XMLNamespace("xlink", "http://www.w3.org/1999/xlink")
implicit val xmlns= List(S, XLink)
private val anchorQuery=new ValueQuery("s:AnchorRef[@role=$role]/@xlink:href")
val start:String=anchorQuery(region,"role"->"start")
val end:String=anchorQuery(region,"role"->"end")
//or
private val annotationQuery=new NodesQuery[Element]("/s:Corpus/s:Analysis/s:AnnotationSet/s:Annotation")
for(annotation:Element <- annotationQuery(doc)) {
//do something with it
}
I guess I should come up with some way of releasing this to the public.
Upvotes: 4