greole
greole

Reputation: 4771

Parsing XML in Haskell

I am trying to learn me some Haskell and I wanted to parse some XML files with the following structure:

<properties>
  <property name="a">
    <value>1</value>
  </property>
  <property name="b">
    <value>2</value>
  </property>
</properties>

Following the example from the wiki I can search for all properties by

runX (readDocument [ withValidate no] "my.xml" 
           >>> deep (isElem >>> hasName "properties")

but how can I extract only the value element of property name="b"?

Upvotes: 2

Views: 565

Answers (2)

greole
greole

Reputation: 4771

TagSoup indeed did the thing for me. Based on the tutorial I found

module Main where
import Text.HTML.TagSoup

searchXML :: IO ()
searchXML = do
      rsp <- readFile "test.xml"
      let tags  = parseTags rsp
      let links = extr "value" [] $
                  extr "property" [("name","b")] tags
      let value = fromTagText $ links !! 0
      putStr value
      where 
          extr a b c = drop 1 $ takeWhile (~/= TagClose a) $
                       dropWhile (~/= TagOpen a b) c

main = searchXML

which prints just the value 2. But I am pretty sure the code can be simplified a lot.

Upvotes: 1

To be honest, I find that HXT is a quite complex library to use. My understanding so far is that you transform a document in another one using arrows chain.

If you want to learn arrows, you may find that my solution is cheating, but for me it did the job : I just use XPath (cabal install hxt-xpath) and produce an output document.

import Text.XML.HXT.Core
import Text.XML.HXT.XPath.Arrows 

main :: IO ()
main = do
      runX $ readDocument [] "my.xml"
            >>>
            root [] [ selem "values" [getXPathTrees "/properties/property[@name=\"b\"]/value"]]
            >>>
            writeDocument [withIndent yes] "out.xml"
      return ()

yielding

<?xml version="1.0" encoding="UTF-8"?>
<values>
  <value>2</value>
</values>

Upvotes: 3

Related Questions