Luka Krajnc
Luka Krajnc

Reputation: 915

Parsing big XML file

I've checked many Questions and I didn't get answer.

I have a big XML file that I need to parse. Currently I am parsing it with XMLReader. It worked good until I started to inserting it into SQL db. If I only echo parse the xml it works fine , if I'm inserting it I get 504 Gateway Time-out error. Here is sample of my code:

Where I parse xml:

$xml = new XMLReader();
$xml->open(APP_PATH_OWA."/trnUpload/TRNavteraData.xml");

while($xml->read()){
 //get products
 if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'table'){    
    $product = array();
 }

 if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'ident'){
    $xml->read();
    $product['id'] = $xml->value;
 }
    ...

Foreach:

foreach($products as $product){
  ...
 $productTitle = $product['title'];
 $productID = $product['id'];
 $productImageUrl = "http://www.example.com/logo.png";
 $productAttrHtml = $product['computed'];

 // after that I'm inserting those data using ZEND framework.

XML file is about 300k+ lines.

Whole php function: http://pastebin.com/S8A5Rdjw

Upvotes: 0

Views: 689

Answers (1)

ThW
ThW

Reputation: 19512

Serializing the process will decrease the memory consumption but increase the runtime. But I don't think time is the problem here.

You might just block the access to the database (table) with to many insert statements.

Some tips:

  1. Using a framework for database imports might be really slow if that kind of action is not supported by the framework. Try to avoid database abstractions for this.

  2. Make sure to use mass inserts. Most databases allow to insert several records at once one way or another. This reduces the database calls. (Of course in increases the needed memory so you will have to find a balance.)

  3. Check that the inserts do not block the selects. Depends on the database and in case of MySQL on the table handler.

  4. Insert into a separate table and rename the tables after that.

  5. Generate a file and use the databases command line client to import it.

Upvotes: 1

Related Questions