This 0ne Pr0grammer
This 0ne Pr0grammer

Reputation: 2662

Parse Text Values From XML File in Java

So right now I am using the SAX parser in Java to parse the "document.xml" file located within a .docx file's archive. Below is a sample of what I am trying to parse...

Sample XML Document

<w:pStyle w:val="Heading2" /> 
  </w:pPr>
  <w:bookmarkStart w:id="0" w:name="_Toc258435889" /> 
  <w:bookmarkStart w:id="1" w:name="_Toc259085121" /> 
  <w:bookmarkStart w:id="2" w:name="_Toc259261685" /> 
- <w:r w:rsidRPr="00415FD6">
  <w:t>Text To Extract</w:t> 
  </w:r>
  <w:bookmarkEnd w:id="0" /> 
  <w:bookmarkEnd w:id="1" /> 
  <w:bookmarkEnd w:id="2" /> 

Right now, I know how to take out attribute values, that's not hard. However, I do not know how to get in and parse the actual text within the nodes. Does anyone have any ideas or prior experience with this? Thank you in advance.

Upvotes: 1

Views: 409

Answers (2)

Nathan Hughes
Nathan Hughes

Reputation: 96385

Read this article on SAX parsing (it is old but still valid), pay particular attention to how the characters method is implemented. It is very unintuitive and trips everybody up, you will get multiple calls to characters for what seems like no good reason.

Also the Java tutorial on SAX has a short explanation of the characters method:

Parsers are not required to return any particular number of characters at one time. A parser can return anything from a single character at a time up to several thousand and still be a standard-conforming implementation. So if your application needs to process the characters it sees, it is wise to have the characters() method accumulate the characters in a java.lang.StringBuffer and operate on them only when you are sure that all of them have been found.

In your case (XML with no mixed-content) that means storing the results of multiple characters() calls until the next call to endElement.

Upvotes: 3

Ed Staub
Ed Staub

Reputation: 15690

See the characters() ContentHandler method. Read the javadoc carefully - you can get multiple calls when you might expect only one.

Upvotes: 2

Related Questions