Reputation: 155
I need to check only one node from each file (109 files) that they are stored on different urls (109 urls). I use this code
public class XPathParserXML {
public String version(String link, String serial) throws SAXException, IOException,
ParserConfigurationException, XPathExpressionException{
String version = new String();
String url = link+serial;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(url);
XPath xPathFactory = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPathFactory.compile("//swVersion/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList node = (NodeList) result;
if (node == null){
version = "!!WORKING!!";
}else{
version = node.item(0).getNodeValue();
}
return version;
}
}
and i call the method "version(link,serial)" in cicle for 109 times
My code take like 20 seconds to elaborate all. Each file weight 0.64KB and i have a 20MB connection.
What can i do to speed up my code?
Upvotes: 1
Views: 522
Reputation: 221210
While that's not the only issue, probably, you should definitely cache and reuse all of those objects between calls to version()
:
DocumentBuilderFactory
DocumentBuilder
XPathFactory
XPathExpression
Besides, you should probably activate one of these flags:
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
See also this question for details:
Java XPath (Apache JAXP implementation) performance
Last but not least, you're serially accessing all those XML files over the wire. It may be useful to reduce the impact of your connection latency by parallelising access to those files, e.g. by using multiple threads at the client side. (Note if you choose multi-threading, then beware of thread-safety issues when caching the objects I've mentioned in the first section. Also, avoid creating too many parallel requests at the same time to prevent your server from failing)
Another way to reduce that impact would be to expose those XML files in a ZIP file from the server to avoid multiple connections and transfer all XML files at once.
From your additional comments, I see that you're using XML validation. This is, of course, expensive and should only be done if really needed. Since you run a very arbitrary XPath expression, I take that you don't care too much about XML validation. Best turn it off!
Since (from your comments) you've measured the parsing to take up most of the CPU, you have two more options to circumvent the whole issue:
//swVersion
element (From your code, I'm assuming that there is only one). SAX is much faster for these use-cases, than DOM.<swVersion>(.*?)</swVersion>
. That should only be your last resort, because it doesn't handle
Upvotes: 3