Reputation: 1243
I'm parsing a document that I cannot change from the internet using a SAX Parser. It was working just fine when the documents came formatted as such:
<outtertag>
<innertag>data</innertag>
<innerag>moreData</innertag>
</outtertag>
However, there are certain calls I make where the XML comes formatted without the outer tags, so I essentially get just a list of data, like such:
<innertag>data</innertag>
<innerag>moreData</innertag>
This seems silly to me, but I don't get to choose how the XML is formatted and it can't be changed for now. The problem is that it seems that the SAX Parser hits the endDocument event as soon as it hits the first closing innertag.
I have a rather hacky solution of converting the InputStream into a String, throwing tags around it, and then converting it back to an InputStream. It actually parses fine that way. But, surely there's a better way. I'd also would prefer not to write a whole other parser. Most of the tags are the same aside from the lack of opening and closing tags.
Just for the heck of it, I'll post the code, but it's pretty standard SAX Parser. The original is actually parsing about 30 some tags:
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
MyHandler handler = new MyHandler();
xmlReader.setContentHandler(handler);
InputSource inputSource = new InputSource(url.openStream());
xmlReader.parse(inputSource);
}
catch (SAXException e) { e.printStackTrace(); }
catch (ParserConfigurationException e) { e.printStackTrace(); }
catch(Exception e) { e.printStackTrace(); }
}
private class MyHandler extends DefaultHandler {
private StringBuilder content;
public MyHandler() {
content = new StringBuilder();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
content = new StringBuilder();
if(localName.equalsIgnoreCase("innertag")) {
//Doing stuff
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
//Doing stuff
}
public void characters(char[] ch, int start, int length)
throws SAXException {
content.append(ch, start, length);
}
public void endDocument() throws SAXException {
//When parsing the second type of document, hits this event almost immediately after parsing first tag
}
}
And, if it matters, here's my hacky code I'm using, but just feels wrong, yet it works:
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
StringBuilder sb = new StringBuilder("<tag>");
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
sb.append("</tag>");
String xml =sb.toString();
InputStream is = new ByteArrayInputStream(xml.getBytes());
InputSource source = new InputSource(is);
xmlReader.parse(source);
Upvotes: 0
Views: 1921
Reputation: 163549
The XML you have is not a well-formed document, but it is a well-formed external parsed entity, which means it can be referenced from a well-formed document by means of an entity reference. So create a skeleton document like this:
<!DOCTYPE doc [
<!ENTITY e SYSTEM "data.xml">
]>
<doc>&e;</doc>
where data.xml is your XML, and pass this document to the XML parser in place of the original. Beats writing dozens of lines of Java code.
Upvotes: 0
Reputation: 128909
I'd say what you're doing now is about as good as you'll get. The one thing to consider improving is the stream -> string -> stream conversion, especially if the documents are large. You could use something like Guava's ByteStreams.join(), which lets you concatenate streams together instead of strings. Something like the following:
import com.google.common.io.*;
import java.io.*;
public class ConcatenateStreams {
public static void main(String[] args) throws Exception {
InputStream malformedXmlContent = externalXmlStream();
InputSupplier<InputStream> joined = ByteStreams.join(
inputSupplier("<root>"),
inputSupplier(malformedXmlContent),
inputSupplier("</root>"));
ByteStreams.copy(joined, System.out);
}
private static InputStream externalXmlStream() {
return new ByteArrayInputStream("<foo>5</foo><bar>10</bar>".getBytes());
}
private static InputSupplier<InputStream> inputSupplier(final String text) {
return inputSupplier(new ByteArrayInputStream(text.getBytes()));
}
private static InputSupplier<InputStream> inputSupplier(final InputStream inputStream) {
return new InputSupplier<InputStream>() {
@Override
public InputStream getInput() throws IOException {
return inputStream;
}
};
}
}
which outputs:
<root><foo>5</foo><bar>10</bar></root>
Upvotes: 1