Reputation: 5500
If I have an XML document like this:
<elem name="Greeting">
<elem name="Name">
and some Haskell type/data definitions like this:
type Name = String
type Value = String
data LocalizedString = LS Name Value
and I wanted to write a Haskell function with the following signature:
getLocalizedStrings :: String -> [LocalizedString]
where the first parameter was the XML text, and the returned value was:
[LS "Greeting" "Hello", LS "Name" "Name"]
how would I do this?
If HaXml is the best tool, how would I use HaXml to achieve the above goal?
Upvotes: 7
Views: 1916
Reputation: 138007
Use one of the XML packages.
The most popular are, in order,
Upvotes: 3
Reputation: 5500
Here's my second attempt (after receiving some good input from others) with TagSoup:
module Xml where
import Data.Char
import Text.HTML.TagSoup
type SName = String
type SValue = String
data LocalizedString = LS SName SValue
deriving Show
getLocalizedStrings :: String -> [LocalizedString]
getLocalizedStrings = create . filterTags . parseTags
filterTags :: [Tag] -> [Tag]
filterTags = filter (\x -> isTagOpenName "elem" x || isTagText x)
create :: [Tag] -> [LocalizedString]
create (TagOpen "elem" [("name", name)] : TagText text : rest) =
LS name (trimWhiteSpace text) : create rest
create (_:rest) = create rest
create [] = []
trimWhiteSpace :: String -> String
trimWhiteSpace = dropWhile isSpace . reverse . dropWhile isSpace . reverse
main = do
xml <- readFile "xml.xml" -- xml.xml contains the xml in the original question.
putStrLn . show . getLocalizedStrings $ xml
The first attempt showcased a naive (and faulty) method for trimming whitespace off of a string.
Upvotes: 1
Reputation: 205034
I've never actually bothered to figure out how to extract bits out of XML documents using HaXML; HXT has met all my needs.
{-# LANGUAGE Arrows #-}
import Data.Maybe
import Text.XML.HXT.Arrow
type Name = String
type Value = String
data LocalizedString = LS Name Value
getLocalizedStrings :: String -> Maybe [LocalizedString]
getLocalizedStrings = (.) listToMaybe . runLA $ xread >>> getRoot
atTag :: ArrowXml a => String -> a XmlTree XmlTree
atTag tag = deep $ isElem >>> hasName tag
getRoot :: ArrowXml a => a XmlTree [LocalizedString]
getRoot = atTag "root" >>> listA getElem
getElem :: ArrowXml a => a XmlTree LocalizedString
getElem = atTag "elem" >>> proc x -> do
name <- getAttrValue "name" -< x
value <- getChildren >>> getText -< x
returnA -< LS name value
You'd probably like a little more error-checking (i.e. don't just lazily use atTag
like me; actually verify that <root>
is root, <elem>
is direct descendent, etc.) but this works just fine on your example.
Now, if you need an introduction to Arrows, unfortunately I don't know of any good one. I myself learned it the "thrown into the ocean to learn how to swim" way.
Something that may be helpful to keep in mind is that the proc
syntax is simply sugar for the basic arrow operations (arr
, >>>
, etc.), just like do
is simply sugar for the basic monad operations (return
, >>=
, etc.). The following are equivalent:
getAttrValue "name" &&& (getChildren >>> getText) >>^ uncurry LS
proc x -> do
name <- getAttrValue "name" -< x
value <- getChildren >>> getText -< x
returnA -< LS name value
Upvotes: 6