Reputation: 679
I need to parse an XML that is big. f.ex 100mb (it can be even more).
For Example: Xml looks like this:
<notes>
<note>
<id>cdsds32da435-wufdhah</id>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
x 1000000 different notes(or even more)
</notes>
Each note has un unique ID. When I Parse an XML, I need to first find if note by specific ID exists in DB if no than INSERT it.
The problem is in Performance(it takes 2 hours). I try to take all ids from the DB (but is also big) with one SELECT, so I dont ask DB each time and I have them in PHP Array (Memory).
$sql = "SELECT id FROM 'notes'";
...
$ids = Array with all ids
I 've also parsed an XML with xml_parser in a loop:
while($data = fread($Xml, '512')) {
xml_parse($xmlParser, $data);
}
I think that parse an XML with simple_xml_parser may generate a too big variable for PHP to handle it.
And than when I have a note ID I check if it exists in $ids:
if (array_search($note->id, $ids) === FALSE) {
//than insert it
}
But it takes too long. So I found that PHP comes with special Arrays called Juddy Arrays http://php.net/manual/en/book.judy.php but I don't know exactly if they are for this - I mean for quick parse BIG Arrays.
I think also with Memcached, to store the ids from DB in many variables, but I want to find a proper solution.
In DB table there are also indexes, to speed up the process. The XML grows every week :) and it conatins every time all notes from the last XML plus new notes.
QUESTION? How to fast parse BIG ARRAYS in PHP? Are Judy Arrays for this? And storing all ids from DB in a variable is a good solution? - it can be to big for PHP in one time.
Upvotes: 1
Views: 539
Reputation: 39763
Are you sure you need to find if the item exists in the DB before inserting it? You can just tell the DB to 'insert it if it does not exist': put a unique key on the ID and use INSERT IGNORE
.
Upvotes: 1
Reputation: 3519
When I parsed DMOZ database (2G xml) I have been used Java solutions (SAX parser). First I was need to transfer a very big array of data from XML (RDF format) into MySQL database. My PHP solution performed this task over 6 hours. But Java solution was made similar task after 15 min. So I can tell you: try to use Java solution based on SAX parser.
Upvotes: 1