sumek
sumek

Reputation: 28132

clojure - simple way to parse xml with no attributes into a map

My xml doesn't make use of attributes and namespaces. Tags can be nested. I'd like to parse it into a Clojure map.

I'd like the tag names to be the keys. The values are either nested maps in case of nodes or text in case of a leaf.

What would be the simplest way to do that?

Upvotes: 3

Views: 711

Answers (2)

mattias
mattias

Reputation: 880

I have a lot XML generated by .NET serializers for nested classes with values, arrays etc. Just entity and content are used.

To me, the default clojure XML-structure with :tag :content etc is rather big messy to work with, and easy to get confused if a deep object.

I use this function to create a simple intermediate representation, which I then can further refine depending on the type of the attributes.

I first parse the string or byte[] using clojure.data.xml/parse and then call keep-tag-and-contents-prepare-leafs

(defn keep-tag-and-contents-prepare-leafs
  "Simplify the clj-xml structure, I am only interested in :tag and :content"
  [xml]
  (if (map? xml)
    [(:tag xml) (keep-tag-and-contents-prepare-leafs (:content xml))]
    (if (seq? xml)
      (if (map? (first xml))
        (for [x xml] (keep-tag-and-contents-prepare-leafs x))
        (do
          ;; we are at the bottom of the xml
          (assert (<= (count xml) 1) "Leafs should be empty or single value")
          (if (empty? xml) nil (first xml)))
        )
      ;; we should never end up here, since we do a look-a-head on the level above the leafs
      (assert false))))

and I get a structure like this:

;; (pp/pprint (mutils/keep-tag-and-contents-prepare-leafs xlmeta-testdata-small))
;; [:defaultFormattings
;;  ([:_columnMeta
;;    ([:XLMetaColumn
;;      ([:_name "Paris"]
;;       [:_caption "Paris"]
;;       [:_width "100"]
;;       [:_hide "false"]
;;       [:_input "false"]
;;       [:_hideExport "false"]
;;       [:_textAreaRows "0"])]
;;     [:XLMetaColumn
;;      ([:_name "footbill40"]
;;       [:_caption "footbill40"]
;;       [:_width "100"]
;;       [:_hide "false"]
;;       [:_input "false"]
;;       [:_hideExport "false"]
;;       [:_textAreaRows "0"])])]
;;   [:_fmtStrings nil]
;;   [:_maxHtmlColumns "50"])]

which easily can be processed further, just switch on vector? and seq?

This intermediate representation is also very compact, so easy to pprint stuff, put a quote in front, and create unit tests.

Upvotes: 1

Frank Henard
Frank Henard

Reputation: 3638

I use this: clojure.xml/parse

The problem you might be finding is that the output isn't structured the way you want your map. You will have to do some kind of transformation from the clojure.xml map to your map.

I tried creating some kind of generic translator, but I ended up realizing that I needed something that defines the structure of the xml (a schema). I then looked for a Clojure project that used xsd to transform the xml for me. Nothing well-supported existed at that time. So I ended up just writing Clojure to do the transformation, which thanks to Clojure turned out to be really easy (also I'd rather write Clojure than xsd). That explains the lack of a Clojure xml schema transformation library.

Something that crossed my mind and I think would be really cool or interesting if it could work, would be to define the schema in prismatic/schema, and use that schema to transform the map. Then you also get the validation and features from prismatic.

Upvotes: 2

Related Questions