Reputation: 660
I'm trying to parse XML that looks like this:
<h1>Collection A</h2>
<table>
<tr>Property 1</tr>
<tr>Property 2</tr>
</table>
<h2>Collection 2</h2>
<table>
<tr>Property 1</tr>
<tr>Property 88</tr>
</table>
I would like to parse that info as such:
MyClass "Collection 1" "Property 1"
MyClass "Collection 1" "Property 2"
MyClass "Collection 2" "Property 1"
MyClass "Collection 2" "Property 88"
I'm not sure how to go about doing this. My first thought was doing something like element "h1" $| followingSibling &// element "tr" &/ content
, but that doesn't work, since it will capture all of the tr's, even the ones that don't "belong" to the table that I'm trying to read from, and I won't be able to know what properties belong to which collection.
How do I go about solving this?
Upvotes: 1
Views: 158
Reputation: 3080
You have to define your own XML Axis of a "immediate sibling" as followingSibling
returns every node after the context. It's possible since Axis
in Text.XML.Cursor
are type synonym of Cursor -> [Cursor]
:
immediateSibling = take 1 . (anyElement <=< followingSibling)
And combining information from different level is just nested list comprehension:
selected = root $/ selector
selector = element "h2" >=> toTuple
-- replace tuple with your constructor
toTuple c = [ (coll, prop)
| coll <- c $/ content
, prop <- c $| (immediateSibling >=> element "table" &/ element "tr" &/ content) ]
Upvotes: 1