Reputation: 1192
I'll be getting a large number of xml files (numbering in tens of thousands every few minutes) from an MQ. The xml files aren't very big. I have to extract the information and save it into a database. I cannot use third party libraries unfortunately (except the apache commons). What strategies/techniques are normally used in this scenario? Is there any xml parser in java or apache which can handle such situations well?
I might also add that I'm using jdk 1.4
Upvotes: 2
Views: 688
Reputation: 4767
Based on the comments and discussion around this topic - I would like to propose a consolidated solution.
Parsing XML files using SAX - As @markspace mentioned, you should go with SAX which is built-in and has good performance.
Use BULK INSERTS if possible - Since you plan to insert a large amount of data consider what type of data are you reading and storing into the database. Do all the XML files contain the same schema (which means they correspond to a single table in the database) OR do they represent different objects (which means you would end up inserting data into multiple tables).
In case the schema of all XML files that needs to be inserted into the same table in the database, then consider batching these data objects and bulk-inserting them into the database. This will be definitely more performing in terms of time as well as resources (you would open only a single connection to persist a batch as opposed to multiple connections for each objects). Of course you would need to spend some time in tuning your batch size and also deciding the error handling strategy for batch inserts (discard all v/s discard erroneous)
If the schema of the XML files are different, then consider clubbing similar XMLs into groups so that you can BULK INSERT these groups later.
Finally - and this is important : Ensure that you release all the
resources such as File handles, Database connections etc once you
are done with processing or in case you encounter errors. In simple
words use try-catch-finally
at the correct places.
While by no means complete, hope this answer provides you a set of critical checkpoints that you need to consider while writing scalable performant code
Upvotes: 1