Reputation: 133
My objective is to extract a list of two lists from this XML file:
<famous_people>
<famous_person>
<first_name>Wolfgang</first_name>
<last_name>Goethe</last_name>
<year_of_birth>1749</year_of_birth>
<country_of_origin>Germany</country_of_origin>
</famous_person>
<famous_person>
<first_name>Miguel</first_name>
<last_name>Cervantes</last_name>
<widely_known_for>Don Quixote</widely_known_for>
</famous_person>
</famous_people>
The list I'm interested in extracting is:
[[("first_name","Wolfgang"),("last_name","Goethe"),("year_of_birth","1749"),("country_of_origin","Germany")],[("first_name","Miguel"),("last_name","Cervantes"),("widely_known_for","Don Quixote")]]
I've only managed to get to the point where all the tuples that I'm interested in are inside of one big flat list, as evidenced by this GHCi output:
Prelude> import Text.XML.HXT.Core
Prelude Text.XML.HXT.Core> import Text.HandsomeSoup
Prelude Text.XML.HXT.Core Text.HandsomeSoup>
Prelude Text.XML.HXT.Core Text.HandsomeSoup> html <- readFile "test.html"
Prelude Text.XML.HXT.Core Text.HandsomeSoup>
Prelude Text.XML.HXT.Core Text.HandsomeSoup> let doc = readString [] html
Prelude Text.XML.HXT.Core Text.HandsomeSoup>
Prelude Text.XML.HXT.Core Text.HandsomeSoup> runX $ doc >>> getChildren >>> getChildren >>> getChildren >>> multi (getName &&& deep getText)
[("first_name","Wolfgang"),("last_name","Goethe"),("year_of_birth","1749"),("country_of_origin","Germany"),("first_name","Miguel"),("last_name","Cervantes"),("widely_known_for","Don Quixote")]
How do I obtain the desired list of two lists?
Upvotes: 2
Views: 566
Reputation: 568
I used the listA
function to collect the result in a list. Here is my code :
module Famous where
import Text.XML.HXT.Core (isElem, hasName, getChildren, getText, listA, runX, readDocument, getName)
import Control.Arrow.ArrowTree (deep)
import Control.Arrow ((>>>), (&&&))
import Text.XML.HXT.Arrow.XmlArrow (ArrowXml)
import Text.XML.HXT.DOM.TypeDefs (XmlTree)
atTag :: ArrowXml a => String -> a XmlTree XmlTree
atTag tag = deep (isElem >>> hasName tag)
parseFamous :: ArrowXml a => a XmlTree [(String, String)]
parseFamous = atTag "famous_person" >>> listA (getChildren >>>
(getName &&& (getChildren >>> getText)))
main :: IO ()
main = do
let path = "famous.xml"
result <- runX (readDocument [] path >>> parseFamous)
print result
Upvotes: 2