dan
dan

Reputation: 45682

How do you keep "state" around when you're doing SAX parsing with Haskell (HaXml)

I'm a total newbie at Haskell, and for my first real problem problem with Haskell I'm trying to parse a huge XML file with HaXml SAX parsing.

The big problem I'm running into is how to figure out what the enclosing element tag of any particular "charData" SaxElement is. If I were doing this in an imperative language, I would just have a stateful Array object that maintains the element tag stack as SAX events happen. I would push an element name to the stack when a "SAX.SaxElementOpen" is encountered, and pop one off when "SAX.SaxElementClose" is encountered. Then if I got a "SAX.SaxCharData" event/element, I could just look at the top of the stack to see what tag it was enclosed in.

Now that I am trying to solve this problem in Haskell, I have no idea how to get around the lack of global stateful variables. I only have a vague notion of what Monads do, so if they are the solution, I could use a tip or two.

Here is hopefully enough code to show how far I've gotten:

module Main where

import qualified Text.XML.HaXml.SAX as SAX
import Text.XML.HaXml
import Data.Maybe
import Text.XML.HaXml.Namespaces

main = let inputFilename = "/path/to/file.xml" in
    do content <- readFile inputFilename
       let (elements, error) = SAX.saxParse inputFilename content
       mapM_ putStrLn (summarizeElements elements)

summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = filter (\s -> length s > 0) $ map summarizeElement elements

summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
    (SAX.SaxElementOpen name attrs)  -> myProcessElem name attrs
    (SAX.SaxCharData charData)       -> myProcessCharData charData 
    (SAX.SaxElementTag name attrs)  -> myProcessElem name attrs
    _ -> ""

Upvotes: 2

Views: 402

Answers (1)

Anthony
Anthony

Reputation: 3791

The problem here is that map does not carry state as you wish. A straightforward approach is to write what you want as a recursive function that passes state through the recursive calls. You will need to decide what type of value you keep on your state stack, but then it's just a matter of...

go :: MyStack -> [SAX.SaxElement] -> [String]
go _ [] = []
go s (e:es) = myProcessElem e : go s' es
  where s' = pushPop s

Upvotes: 1

Related Questions