Reputation: 179
I need to parse a large XML file (>1 GB) which is located on a FTP server. I have a FTP stream aquired by ftp_connect(). (I use this stream for other FTP-related actions)
I know XMLReader is preferred for large XML files, but it will only accept a URI. So I assume a stream wrapper will be required. And the only ftp-function I know of which will allow me to retrieve only a small part of the file is ftp_nb_fget() in combination with ftp_nb_continue().
However, I do not know how I should put all of this together to make sure that a minimum amount of memory is used.
Upvotes: 4
Views: 2068
Reputation: 317049
Hmm, I never tried that with FTP, but setting the Stream Context can be done with
Then just put in the FTP URI in open()
.
EDIT: Note that you can use the Stream Context for other actions as well. If you are uploading files, you can probably use the same stream context in combination with file_put_contents
, so you dont necessarily need any of the ftp* functions at all.
Upvotes: 0
Reputation: 165201
This will depend on the schema of your XML file. But if it's something similar to RSS in that it's really just a long list of items (all encapsulated in a tag), then what I've done is to parse out the individual sections, and parse them as individual domdocuments:
$buffer = '';
while ($line = getLineFromFtp()) {
$buffer .= $line;
if (strpos($line, '</item>') !== false) {
parseBuffer($buffer);
$buffer = '';
}
}
That's pseudo code, but it's a light way of handling a specific type of XML file without building your own XMLReader. You'd of course need to check for opening tags as well, to ensure that the buffer is always a valid xml file.
Note that this won't work with all XML types. But if it fits, it's a easy and clean way of doing it while keeping your memory footprint as low as possible...
Upvotes: 0
Reputation: 51411
It looks like you may need to build on top of the low-level XML parser bits.
In particular, you can use xml_parse
to process XML one chunk of the XML string at a time, after calling the various xml_set_*
functions with callbacks to handle elements, character data, namespaces, entities, and so on. Those callbacks will be triggered whenever the parser detects that it has enough data to do so, which should mean that you can process the file as you read it in arbitrarily-sized chunks from the FTP site.
Proof of concept using CLI and xml_set_default_handler
, which will get called for everything that doesn't have a specific handler:
php > $p = xml_parser_create('utf-8');
php > xml_set_default_handler($p, function() { print_r(func_get_args()); });
php > xml_parse($p, '<a');
php > xml_parse($p, '>');
php > xml_parse($p, 'Foo<b>Bar</b>Baz');
Array
(
[0] => Resource id #3
[1] => <a>
)
Array
(
[0] => Resource id #3
[1] => Foo
)
Array
(
[0] => Resource id #3
[1] => <b>
)
Array
(
[0] => Resource id #3
[1] => Bar
)
Array
(
[0] => Resource id #3
[1] => </b>
)
php > xml_parse($p, '</a>');
Array
(
[0] => Resource id #3
[1] => Baz
)
Array
(
[0] => Resource id #3
[1] => </a>
)
php >
Upvotes: 0