tommy chheng
tommy chheng

Reputation: 9228

Is there a way to perform a XPath string query using Scala's XML library?

Given an scala XML object, can I perform a xpath string query like "//entries[@title='scala']" ?

Ideally, it would be like:

<a><b name='n1'></b></a>.xpath("//b[@name='n1']")

I can't manually convert all the xpath queries to scala's internal xpath-ish method calls as my program will is dynamically accepting xpath queries.

Additionally, the built-in java xml library is very verbose so I would like to avoid it.

Upvotes: 3

Views: 2282

Answers (2)

Nicolas Rinaudo
Nicolas Rinaudo

Reputation: 6178

kantan.xpath does just that. Here's something I just typed in the REPL:

import kantan.xpath._
import kantan.xpath.ops._

"<a><b name='n1'></b></a>".evalXPath[Node]("//b[@name='n1']")

, where the Node type parameter describes the type one expects to extract from the XML document. A perhaps more explicit example would be:

new URI("http://stackoverflow.com").evalXPath[List[URI]]("//a/@href")

This would download the stackoverflow homepage, evaluate it as an XML Document (there's a NekoHTML module for HTML sanitisation) and extract the target of all links.

Upvotes: 1

Ken Bloom
Ken Bloom

Reputation: 58800

Your best bet is (and always was, even in Java) to use JDOM. I've pimped JDom with the following library to be a bit more scala friendly:

import org.jdom._
import org.jdom.xpath._
import scala.collection.JavaConversions
import java.util._
import scala.collection.Traversable


package pimp.org.jdom{
   object XMLNamespace{
      def apply(prefix:String,uri:String) = Namespace.getNamespace(prefix,uri)
      def unapply(x:Namespace) = Some( (x.getPrefix, x.getURI) )
   }
   object XMLElement{
      implicit def wrap(e:Element) = new XMLElement(e)
      def unapply(x:Element) = Some( (x.getName, x.getNamespace) )
   }
   class XMLElement(underlying:Element){
      def attributes:java.util.List[Attribute] =
         underlying.getAttributes.asInstanceOf[java.util.List[Attribute]]
      def children:java.util.List[Element] =
         underlying.getChildren.asInstanceOf[java.util.List[Element]]
      def children(name: String): java.util.List[Element] =
         underlying.getChildren(name).asInstanceOf[java.util.List[Element]]
      def children(name: String, ns: Namespace): java.util.List[Element] =
         underlying.getChildren(name, ns).asInstanceOf[java.util.List[Element]]
   }
}

package pimp.org.jdom.xpath{
   import pimp.org.jdom._

   //instances of these classes are not thread safe when xpath variables are used

   class SingleNodeQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.selectSingleNode(startFrom).asInstanceOf[NType]
      }
   }

   class NodesQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.selectNodes(startFrom).asInstanceOf[java.util.List[NType]]
      }
   }

   class NumberValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.numberValueOf(startFrom).intValue
      }
   }

   class ValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.valueOf(startFrom)
      }
   }

}

My idea when I wrote this was that in general, you want to compile each XPath query in advance (so that it can be reused more than once), and that you want to specify the type returned by the query at the point where you specify the text of the query (not like JDOM's XPath class does which is to pick one of four methods to call at execution time).

Namespaces should be passed around implicitly (so you can specify them once and then forget about them), and XPath variable binding should be available at query time.

You'd use the library like this: (Explicit type annotations can be inferred -- I've included them for illustration only.)

val S = XMLNamespace("s","http://www.nist.gov/speech/atlas")
val XLink = XMLNamespace("xlink", "http://www.w3.org/1999/xlink")
implicit val xmlns= List(S, XLink)

private val anchorQuery=new ValueQuery("s:AnchorRef[@role=$role]/@xlink:href")

val start:String=anchorQuery(region,"role"->"start")
val end:String=anchorQuery(region,"role"->"end")

//or

private val annotationQuery=new NodesQuery[Element]("/s:Corpus/s:Analysis/s:AnnotationSet/s:Annotation")

for(annotation:Element <- annotationQuery(doc)) {
  //do something with it
}

I guess I should come up with some way of releasing this to the public.

Upvotes: 4

Related Questions