Reputation: 1666
I'm going to create a class, which should unmarshall very huge xml files.
I've implemented general unmarshalling:
public XMLProcessor(XMLFile file) throws JAXBException, IOException, SAXException {
JAXBContext jc = JAXBContext.newInstance(Customers.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
File xml = new File(file.getFile().getOriginalFilename());
file.getFile().transferTo(xml);
this.customers = (Customers) unmarshaller.unmarshal(xml);
}
It works fine, but it took more than a minute to process 1 million customers XML.
Can i improve perfomance by creating multiple threads and unmarshall a few parts of XML file concurrently?
How should i split my XML file into parts?
Could you show me some sample code for my case?
Upvotes: 1
Views: 1468
Reputation: 21
Although I cannot provide a complete solution yet, I'd like to share with you the approach that I am currently implementing on a similar problem. My XML file structure is like:
<products>
<product id ="p1">
<variant id="v1"></variant>
<variant id="v2"></variant>
</product>
<product id ="p2">
<variant id="v3"></variant>
<variant id="v4"></variant>
</product>
</products>
products and variants may be quite complex, with a lot of attributes, lists etc.
My current approach is to use SAX to extract the XML-stream of a single product entity and then hand this over to a new Unmarshaller Thread (with standard multi-threading operations, limiting to a max thread count, etc.).
However I am still not 100% confident if SAX generates too much overhead (which could eat up the multi-threading benefit). If this is the case, I'll try to read the XML-stream directly, reacting on the open/close-tags for "". A this won't be xml-conform, this is my measure of last resort
Upvotes: 1