Reputation: 933
Given the XML below:
<Items>
<Item>...</Item>
<Item>...</Item>
<Item>...</Item>
<Item>...</Item>
</Items>
I am writing a function to return count of all <Item>
elements (4 in this case). The actual XML file is huge and I don't want to load the entire thing in memory in order to parse it.
Using command line, I managed to get what I need with the following line:
grep "<Item>" my_file.xml -o | wc -l
Is there an equivalent solution in PHP that I can use to get the same result?
Upvotes: 2
Views: 199
Reputation: 21492
It is easily done with XPath:
$doc = new DOMDocument();
$doc->load('my_file.xml', LIBXML_PARSEHUGE);
$xp = new DOMXPath($doc);
$count = $xp->evaluate('count(//Item)');
The XPath expression returns the number of all Item
tags in the document.
The LIBXML_PARSEHUGE
option only affects internal limits on the depth, entity recursion, and the size of text nodes. However, the DOM parser loads the entire document into memory.
For really huge files, use a SAX parser, which operates on each piece of XML sequentially (and thus loads only a small portion of the document into memory):
$counter = 0;
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, function ($parser, $name) use (&$counter) {
if ($name === 'ITEM') {
$counter++;
}
}, null);
if (!($fp = fopen('my_file.xml', 'r'))) {
die('Could not open XML input');
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
Upvotes: 1