Reputation: 4326
This question is about how to parse xml content with xmlns
attributes etc. I wrote code to parse it which works. I will appreciate pointers on whether it can be done better.
I have an XML file test.xml
as below:
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body>
<SomeResponse xmlns="https://testsomestuff.org/API/WS/">
<SomeResult>
<html>
<head>
<title>My <b>Title</b></title>
</head>
<body>
<p>Foo bar baz</p>
</body>
</html>
</SomeResult>
</SomeResponse>
</soap:Body></soap:Envelope>
I wrote the code to parse the "SomeResult" content using xml-conduit
:
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding (readFile)
import Text.XML
import Text.XML.Cursor
import qualified Data.Text as T
import Data.Text.Lazy.Builder (toLazyText)
import Data.Text.Lazy (fromStrict)
main :: IO ()
main = do
doc <- readFile def "test.xml"
let cursor = fromDocument doc
res = fromStrict $ T.concat $ child cursor >>= laxElement "Body" >>= child >>= laxElement "SomeResponse" >>= child >>= laxElement "SomeResult" >>= descendant >>= content
pres = parseText_ def res
cursor2 = fromDocument pres
res2 = child cursor2 >>= element "head" >>= child >>= element "title" >>= descendant >>= content
print $ res2
Output in ghci
: parses correctly:
*Main> main
["My ","Title"]
Is laxElement
approach to locate the SomeResult
content good way to do it? If there is a better way, I will very much appreciate pointers on this.
Also, I need to do http encoding in reverse direction (when building a request for the response above) where the inner body is escaped (like under SomeResult
in text.xml
). Is that something that is taken care of by default when building request using Text.XML
, or do I have to convert the inner body to escaped http explicitly by using something like html-entities ?
Upvotes: 1
Views: 211
Reputation: 216
Together with xml-conduit
, I would suggest the use of a tiny 'lens' package such as xml-html-conduit-lens or xml-lens
(both are quite similar but I have chosen the first after a quick browse at the source). Namespace is supported (see this issue)
You can look at one of my experimental project if you need a more concrete example. From that project, here is a traversal to get the information of a specific machine from the VCloud API:
fetchVM :: AsXmlDocument t => Text -> Traversal' t Element
fetchVM n = xml...ovfNode "VirtualSystem".attributed (ix (nsName ovfNS "id").only n)
You can then combine traversals as such:
vmId = raw ^. responseBody . fetchVM vmName . fetchVmId.text
Look at how ovhNode
or nsName
is defined to see how I handle namespace.
Here is another interesting article on the subject: https://www.schoolofhaskell.com/user/chad/snippets/random-code-snippets/xml-conduit-lens
Another tip is to stick with 'xml-conduit' (at least for now). Some have suggested taggy
as a replacement but unfortunately it is currently not in an active development cycle (see https://github.com/alpmestan/taggy/issues/14)
I hope it helps.
Upvotes: 1