Reputation: 3726
I'm trying to parse a moderately large XML file (6mb) in php using simpleXML. The script takes each record from the XML file, checks to see if it's already been imported, and, if it hasn't, updates/inserts that record into my own db.
The problem is I'm constantly getting a Fatal error about exceeding memory allocation:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 256 bytes) in /.../system/database/drivers/mysql/mysql_result.php on line 162
I avoided that error by using the following line to increase max memory allocation (following tip from here):
ini_set('memory_limit', '-1');
However, then I run up against the max execution time of 60 seconds, and, for whatever reason, my server (XAMPP on Mac OS X) won't let me increase that time (script simply won't run if I try to include a line like:)
set_time_limit(240);
This all seems very inefficient, however; shouldn't I be able to break the file up some how and process it sequentially? In the controller below I have a count variable ($cycle) to keep track of what record I'm on but I can't figure out how to implement it that it still doesn't have to process the whole XML file.
The controller (I'm using CodeIgniter) has this basic structure:
$f = base_url().'data/data.xml';
if($data = file_get_contents($f))
{
$cycle = 0;
$xml = new SimpleXMLElement($data);
foreach($xml->person as $p)
{
//this makes a single call to db for single field based on id of record in XML file
if($this->_notImported('source',$p['id']))
{
//various process here, mainly breaking up the data for inserting into four different bales
}
$cycle++;
}
}
Any thoughts?
To shed further light on what I'm doing, I'm grabbing most of the attributes of each element and subeelement and inserting them into my db. For example, using my old code, I have something like this:
$insert = array('indiv_name' => $p['fullname'],
'indiv_first' => ($p['firstname']),
'indiv_last' => ($p['lastname']),
'indiv_middle' => ($p['middlename']),
'indiv_other' => ($p['namemod']),
'indiv_full_name' => $full_name,
'indiv_title' => ($p['title']),
'indiv_dob' => ($p['birthday']),
'indiv_gender' => ($p['gender']),
'indiv_religion' => ($p['religion']),
'indiv_url' => ($url)
);
With the suggestions of using XMLReader (see below), how could I accomplish parsing the attributes of both the main element and subelements?
Upvotes: 0
Views: 5949
Reputation: 67695
Use XMLReader.
Say your document is like this:
<test>
<hello>world</hello>
<foo>bar</foo>
</test>
With XMLReader:
$xml = new XMLReader;
$xml->open('doc.xml');
$xml->read();
while ($xml->read()) {
if ($xml->nodeType == XMLReader::ELEMENT) {
print $xml->name.': ';
} else if ($xml->nodeType == XMLReader::TEXT) {
print $xml->value.PHP_EOL;
}
}
This outputs:
hello: world
foo: bar
The nice thing is that you can also use expand
to fetch the node as a DOMNode object.
Upvotes: 6
Reputation: 908
It sounds like the problem is you are reading the whole xml file into memory before trying to manipulate it. Use XMLReader to walk you way through the file stream instead of loading everything into memory for manipulation.
Upvotes: 4
Reputation: 186562
How about instead of using xml, use json? The data will be much smaller in JSON format and I would imagine you won't run into the same memory issues because of that fact.
Upvotes: 1