Shivraj Nimbalkar
Shivraj Nimbalkar

Reputation: 159

Load huge XML data to SQL Server parallely using Java

I want to load huge file with XML data into SQL server 2008. Before loading each record, I need to validate some of its fields with existing data from different table in database. (for example, if I am loading data with userid and account details into "user_account" table, I need to check userid is present in "user" table of the database)

I am doing this using Java.I am planning to divide XML file into small files and run parallel load using different threads. I am logging load errors in the log file. I can use synchronization to avoid log file consistency problem.

I would like to know if my approach is correct. Please let me know if any other approach will help me to perform load more fast/efficiently.

Upvotes: 0

Views: 279

Answers (1)

AcidJunkie
AcidJunkie

Reputation: 1918

doing a check whether a record is in the database or not outside the SQL server is very time consuming compared to do it directly in SQL server. As an alternative, you could do the following:

  • Split up the XML files in chunks . e.g about 10MB chunks. Ensure the XML chunk still conforms to your XML schema.
  • Then insert the chunks to an import table
  • Kick off a stored procedure which gets all new XML chunks from the import table and do the comparison using the MERGE statement

The advantage of this is, that

  • it is much faster than doing it on record level in java
  • chunks aren't consuming that much memory in SQL when parsing the XML compared to a full blown XML (e.g. 1 GB or bigger)

In case you can define the XML, i suggest to use as short as possible names for the element and attribute names as this also saves a lot of memory in SQL server for the parsing

Upvotes: 1

Related Questions