Vesna
Vesna

Reputation: 345

clj-xpath for xml with simple and nested tags

I have function (contents-only) that extracts contents from xml using clj-xpath library.

(ns example
(:use  [clj-xpath.core]))

 (def data-url
"http://api.eventful.com/rest/events/search?app_key=4H4Vff4PdrTGp3vV&keywords=music&location=New+York&date=Future")

(defn xml-data [url] (slurp url))

(defn defxmldoc [url]
      (xml->doc (xml-data url)))

(defn contents-only [url root-tag tags] 
 (vec(map(fn [item]
         (into {}
              (map (fn [tag]
                     [tag ($x:text (str "./" (name tag))item)])tags)))
      (take 5 ($x root-tag (defxmldoc url))))))

The function call looks like this

(contents-only data-url "/search/events/event" [:title :url])

It works fine with not-nested tags, when I try to extract text from a nested tag ie.

<performers>
 <performer>
   <id>P0-001-000009049-1</id>
    <url>...</url>
    <name>Lindsey Buckingham</name>
    <short_bio>Rock</short_bio>
    <creator>TomAzoff</creator>
    <linker>evdb</linker>
</performer>

Function call looks like this

(contents-only data-url "/search/events/event" [:title :url :name])

I get RuntimeException Error, more (or less) than 1 result (0) from xml({:children...) for xpath(./name) clj-xpath.core/throwf (core.clj:26)

How to change my contents-only function, so I can pass a nested tag as well?

Upvotes: 2

Views: 148

Answers (1)

T.Gounelle
T.Gounelle

Reputation: 6033

The fastest way : change "./" to ".//" in contents-only function.

user> (first (contents-only data-url "/search/events/event" [:title :id :name]))
{:title "Legally Blonde the Musical", :id "P0-001-000351944-7", :name "Legally Blonde The Musical"}
user> 

As explained in xpath documentation, .//name will select all nodes name starting from the current node, wherever in the hierarchy.

If name is not unique, it may not be what you want, and one way is to be explicit in the path you specify, e.g.

(contents-only data-url "/search/events/event"
                [[:title]
                 [:performers :performer :id]
                 [:performers :performer :name]])

and to have some helper functions like :

(defn build-path
  ([sep kys] (build-path nil sep kys))
  ([root sep kys]
   (->> kys (map name) (interpose sep) 
        (concat (when root (list root sep))) (apply str))))

(defn path
  "build a path from a collection"
  [t]
  (build-path "." \/ t))

user> (path [:performers :performer :id])
"./performers/performer/id"

(defn path-key
  "Transform [:a :b :c] into :a-b-c"
  [t]
  (->> t (build-path \-) keyword))

user> (path-key [:performers :performer :id])
:performers-performer-id

Then the contents-only becomes :

(defn contents-only2 [url root-tag tags]
  (vec (map(fn [item]
             (into {}
                   (map (fn [tag]
                          [(path-key tag) ($x:text (path tag) item)])
                        tags)))
           (take 5 ($x root-tag (defxmldoc url))))))

and the result :

user> (first (contents-only2 data-url "/search/events/event"
                      [[:title]
                       [:performers :performer :id]
                       [:performers :performer :name]]))
{:title "Legally Blonde the Musical", :performers-performer-id "P0-001-000351944-7", :performers-performer-name "Legally Blonde The Musical"}
user> 

Upvotes: 2

Related Questions