Alexey Raga
Alexey Raga

Reputation: 7525

How to skip elements in xml-conduit

I have to handle rather big XML files and I want to use the streaming API of xml-conduit to go through them and extract the info I need. In my case using streaming xml-conduit is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect.

Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema.

I know, however, elements that I am interested in, and their shapes. But, as I said, these elements can be located in different order with other elements, etc.

What I need, I guess, is just to skip all the elements I am not interested in and only to consider ones that want.

I initially wanted to write something like that:

tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)

but it wouldn't compile because ignoreType returns Maybe ()

What would be the way to skip all the "unknown" tags when using xml-conduit streaming API?

Upvotes: 2

Views: 236

Answers (1)

palik
palik

Reputation: 2863

As proposed here

λ> runConduit $ Text.XML.Stream.Parse.parseLBS def  "<foo>bar</foo><person age=\"25\">Michael</person><person age=\"2\">Eliezer</person>" .| many_ (choose [takeTree "person" ignoreAttrs, ignoreAnyTreeContent]) .| manyYield parsePerson .| Data.Conduit.List.consume 
[Person 25 "Michael",Person 2 "Eliezer"]

Upvotes: 1

Related Questions