Reputation: 34296
Well I am trying to connect to one remote server via socket, and I get big xml responses back from socket, delimited by a '\n' character.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
.......
.......
</data>
</Response>\n <---- \n acts as delimiter
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
....
....
</data>
</Response>\n
..
I am trying to parse these xml using SAX Parser. Ideally I want to get one full response to a string by searching for '\n' and give this response to parser. But since my single response is very large, I am getting outOfMemory Exception when holding such a large xml in string..So the only option remained was to stream the xml to SAX.
SAXParserFactory spfactory = SAXParserFactory.newInstance();
SAXParser saxParser = spfactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(new MyDefaultHandler(context));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(mySocket.getInputStream()));
xmlReader.parse(xmlInputSource);
I am using closeShieldInputStream to prevent SAX closing my socket stream on exception because of '\n'. I asked a previous question on that..
Now sometimes I am getting Parse Error
org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 8: not well-formed (invalid token)
I searched for it and found out this error normally comes when the encoding of actual xml is not same as what SAX expecting. I wrote a C program and print out the xml, and all my xml is encoded by UTF-8.
Now my question..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt"));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
xmlReader.parse(xmlInputSource);
a new file with name log.txt getting created when I mount the SDCard but it is empty..Am I using this right?
I worked it out with TeeInputStream itself..thanks Hemal Pandya for suggesting that..
//open a log file in append mode..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt",true));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
try{
xmlReader.parse(xmlInputSource);
//flush content in the log stream to file..this code only executes if parsing completed successfully
log.flush();
}catch(SaxException e){
//we want to get the log even if parsing failed..So we are making sure we get the log in either case..
log.flush();
}
Upvotes: 2
Views: 4377
Reputation: 598309
I am not familiar with Expat, but to accomplish you are are describing in general, you need a SAX parser that supports pushing data into the parser instead of having the parser pull data from a source. Check if Expat supports a push model. If it does, then you can simply read a chunk of data from the socket, push it to the parser, and it will parse whatever it can from the chuck, caching any remaining data for use during the next push. Repeat as needed until you are ready to close the socket connection. In this model, the \n
separator would get treated as miscellaneous whitespace between nodes, so you have to use the SAX events to detect when a new <Response>
node opens and closes. Also, because you are receiving multiple <Response>
nodes in the data, and XML does not allow more than 1 top-level document node, you would need to push a custom opening tag into the parser before you then start pushing the socket data into the parser. The custom opening tag will then become the top-level document node, and the <Response>
nodes will be children of it.
Upvotes: 0
Reputation: 28761
Is there any way to print (or write to any file) the input to SAX as it streams from socket?
Apache Commons has a TeeInputStream that should be useful.
OutputStream log = new BufferedOutputStream(new FileOutputtStream("response.xml"));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(new TeeInputStream(mySocket.getInputStream(), log)));
I haven't used it, you might want to try it first in a standalone program to figure out close
semantics, though looking at docs and your requirements it looks like you would want to close it separately at end.
Upvotes: 2