Jonny5
Jonny5

Reputation: 1499

Dom4j: sax parsing a loaded document

I am trying to accomplish the following:

  1. load a document (done)
  2. go trough the document depth first and use a DefaultHandler from JDK to do some work

The reason I want to do this is that I already have my handler, and now I am using it with a SAX parser. I now want to use the handler on the in-memory document.

Note that this is useful in the following way: I have to use the handler multiple times. For large documents I want to use SAX, for small I want to use the internal representation.

Thanks!

Upvotes: 0

Views: 335

Answers (1)

Don Roby
Don Roby

Reputation: 41137

The quickest way (quick in coding) to accomplish this is to write the portion of the internal document that you wish to parse with SAX into an internal string, and then using a StringReader based on that string, pass that to a SAX parser using your handler.

What you really need is to generate SAX events based on your data and feed those events to the handler. You can do that by getting the data into the form of an InputSource or Reader and then using that in your parse, which is the tactic described above, or you can simply simulate the SAX events by directly calling the methods of the ContentHandler you've already written. But calling them in the right order and feeding them the right data to accomplish what you need may be painful if your document is at all complex.

If Dom4J provides a way to create an InputSource based on a node in your document structure, that will be the easiest to use, and likely much more efficient than writing it to a string first.

You might better consider extracting the portions of your ContentHandler that do the actual work into a separate class that you can use both from the ContentHandler and from a new class that walks the internal tree.

Upvotes: 1

Related Questions