silent-box
silent-box

Reputation: 1666

Java JAXB multithreading unmarshalling

I'm going to create a class, which should unmarshall very huge xml files.

I've implemented general unmarshalling:

public XMLProcessor(XMLFile file) throws JAXBException, IOException, SAXException {

    JAXBContext jc = JAXBContext.newInstance(Customers.class);
    Unmarshaller unmarshaller = jc.createUnmarshaller();

    File xml = new File(file.getFile().getOriginalFilename());
    file.getFile().transferTo(xml);
    this.customers = (Customers) unmarshaller.unmarshal(xml);
}

It works fine, but it took more than a minute to process 1 million customers XML.

Can i improve perfomance by creating multiple threads and unmarshall a few parts of XML file concurrently?

How should i split my XML file into parts?

Could you show me some sample code for my case?

Upvotes: 1

Views: 1468

Answers (1)

cgicgi
cgicgi

Reputation: 21

Although I cannot provide a complete solution yet, I'd like to share with you the approach that I am currently implementing on a similar problem. My XML file structure is like:

<products>
  <product id ="p1">
    <variant id="v1"></variant>
    <variant id="v2"></variant>
  </product>
  <product id ="p2">
    <variant id="v3"></variant>
    <variant id="v4"></variant>
  </product>
</products>

products and variants may be quite complex, with a lot of attributes, lists etc.

My current approach is to use SAX to extract the XML-stream of a single product entity and then hand this over to a new Unmarshaller Thread (with standard multi-threading operations, limiting to a max thread count, etc.).

However I am still not 100% confident if SAX generates too much overhead (which could eat up the multi-threading benefit). If this is the case, I'll try to read the XML-stream directly, reacting on the open/close-tags for "". A this won't be xml-conform, this is my measure of last resort

Upvotes: 1

Related Questions