Vesna
Vesna

Reputation: 345

Parsing xml in clojure with arbitrary tags using clj-xpath

I'm trying to parse some xml using clj-xpath, and basically I want to make a function that looks like this

(map
         (fn [item]
           {:title ($x:text "./title" item)
            :url  ($x:text "./url" item)})
         (take 5
               ($x "/search/events/event" (xmldoc))))

But with arbitrary tags. So far, I have this

ns mashup-dsl.datamodel
(:use
    [clj-xpath.core])
(def data-url "http://api.eventful.com/rest/events/search?  app_key=4H4Vff4PdrTGp3vV&keywords=music&location=Belgrade&date=Future")

(def events-xml
 (fn [] (slurp data-url)))

(def xmldoc
  (fn [] (xml->doc (events-xml))))

(def item (take 5 ($x "/search/events/event" (xmldoc))))

(defn create-xpath [tag] (str "./" tag))

(def tags ["title" "url"])

(defn parse [item]
    (doseq [tag tags])(into {} (keyword tag) ($x:text (create-xpath tag) item)))

But I'm getting this error, TransformerException Extra illegal tokens: '$', 'tag', '@', '64516c52' org.apache.xpath.compiler.XPathParser.error (XPathParser.java:610). So the problem is in parse function. Any ideas?

Upvotes: 2

Views: 395

Answers (2)

Nicolas Modrzyk
Nicolas Modrzyk

Reputation: 14197

The simplest form would be:

  (def url 
      (str 
          "http://api.eventful.com/rest/events/search?"
          "app_key=4H4Vff4PdrTGp3vV&"
          "keywords=music&"
          "location=Tokyo&"
          "date=Future"))
  (def xml (slurp url))
  (def event-titles (map #($x:text "./title" %) ($x "//event" xml)))

And the printout of event-titles would be:

("FLOPPY 10th Anniversary 「This is computer music」" "IN BUSINESS" "UNIT 10th Anniversary Erection" "In The Mix at 0" "\" 20140530 - Sick Team Release Party \"" "Fanfare Ciocarlia @ World Beat Festival" "Fanfare Ciocarlia @ Musashino Hall" "DBS presents PINCH Birthday Bash!!!" "BLUES SISTERS (from RESPECT)" "UNIST 2nd Album「Acoustic」リリースパーティー 「リリースしちゃってウカれNight(ドヤッ)☆」")

EDIT For a versatile function, you could define:

(defn search-for [tag local-path]
  (map #($x:text (str (local-path) %) ($x (str "//" tag) *xml*)))

and use it like:

 (search-for "event" "@id")

or

 (search-for "event" "./title")

or

 (search-for "image" "./url")

Upvotes: 3

edbond
edbond

Reputation: 3951

Here is how to extract first 5 titles:

user=> (map #($x:text "./title" %) (take 5 ($x "//event" (xmldoc))))
("9th International Belgrade Early Music Festival" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "ICTM Study Group on Music and Dance in Southeastern Europe Conference" "New Belgrade Opera, Madlenianum Opera-Theatre, New Trinity Baroque; Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'incoronazione di Poppea\"")

It your example doseq inproperly closed and you need to complile expression to use against xml->doc result.

You can create a helper function that will return function to extract text from tag:

(defn tag-fn [tag] (partial $x:text tag))

Now, you can generate functions for "title" and "url":

user=> (tag-fn "title")
#<core$partial$fn__4190 clojure.core$partial$fn__4190@71cc2b7a>

and

user=> (map (tag-fn "title") (take 5 ($x "//event" (xmldoc))))
("9th International Belgrade Early Music Festival" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "ICTM Study Group on Music and Dance in Southeastern Europe Conference" "New Belgrade Opera, Madlenianum Opera-Theatre, New Trinity Baroque; Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'incoronazione di Poppea\"")

or url and title:

user=> (map (juxt (tag-fn "url") (tag-fn "title")) (take 2 ($x "//event" (xmldoc))))
(["http://eventful.com/belgrade/events/9th-international-belgrade-/E0-001-064654999-7@2014061420?utm_source=apis&utm_medium=apim&utm_campaign=apic" "9th International Belgrade Early Music Festival"] ["http://eventful.com/belgrade/events/belgrade-baroque-academy-mijanovic-gosta-9th-belg-/E0-001-059734872-8?utm_source=apis&utm_medium=apim&utm_campaign=apic" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\""])

or both url and title:

user=> (map (apply juxt (map tag-fn ["url" "title"])) (take 2 ($x "//event" (xmldoc))))
(["http://eventful.com/belgrade/events/9th-international-belgrade-/E0-001-064654999-7@2014061420?utm_source=apis&utm_medium=apim&utm_campaign=apic" "9th International Belgrade Early Music Festival"] ["http://eventful.com/belgrade/events/belgrade-baroque-academy-mijanovic-gosta-9th-belg-/E0-001-059734871-9?utm_source=apis&utm_medium=apim&utm_campaign=apic" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\""])

Upvotes: 3

Related Questions