Reputation: 298
I'm using the built in XMLReader in php to read data from external xml feeds. When I try to read a feed that starts with a new line, I get the following error:
ErrorException: XMLReader::read(): http://example.com/feeds/feed1.xml:2: parser error : XML declaration allowed only at the start of the document
I think it's because the feed starts with a new line, but I don't know how to solve the problem? How can I make it skip the first line if it contains a newline?
I can't seem to find anyone how has solved this problem. They have some workaround using the SimpleXMLElement, but I cant load the entire document into memory.
Here is my code:
$reader = new XMLReader;
$reader->open($linkToExternalFeed);
while ($reader->read() && $reader->name != 'item');
while ($reader->name == 'item')
{
$node = new SimpleXMLElement($reader->readOuterXML());
$this->doSomeParsing($node);
unset($node);
$reader->next($reader->name);
}
$reader->close();
Upvotes: 0
Views: 626
Reputation: 19502
You could write a streamwrapper that filters the stream. After it finds the first non whitespace it would remove the filter and start passing the data to XMLWriter.
class ResourceWrapper {
private $_stream;
private $_filter;
private $context;
public static function createContext(
$stream, callable $filter = NULL, string $protocol = 'myproject-resource'
): array {
self::register($protocol);
return [
$protocol.'://context',
\stream_context_create(
[
$protocol => [
'stream' => $stream,
'filter' => $filter
]
]
)
];
}
private static function register($protocol) {
if (!\in_array($protocol, \stream_get_wrappers(), TRUE)) {
\stream_wrapper_register($protocol, __CLASS__);
}
}
public function removeFilter() {
$this->_filter = NULL;
}
public function url_stat(string $path , int $flags): array {
return [];
}
public function stream_open(
string $path, string $mode, int $options, &$opened_path
): bool {
list($protocol, $id) = \explode('://', $path);
$context = \stream_context_get_options($this->context);
if (
isset($context[$protocol]['stream']) &&
\is_resource($context[$protocol]['stream'])
) {
$this->_stream = $context[$protocol]['stream'];
$this->_filter = $context[$protocol]['filter'];
return TRUE;
}
return FALSE;
}
public function stream_read(int $count) {
if (NULL !== $this->_filter) {
$filter = $this->_filter;
return $filter(\fread($this->_stream, $count), $this);
}
return \fread($this->_stream, $count);
}
public function stream_eof(): bool {
return \feof($this->_stream);
}
}
Usage:
$xml = <<<'XML'
<?xml version="1.0"?>
<person><name>Alice</name></person>
XML;
// open the example XML string as a file stream
$resource = fopen('data://text/plain;base64,'.base64_encode($xml), 'rb');
$reader = new \XMLReader();
// create context for the stream and the filter
list($uri, $context) = \ResourceWrapper::createContext(
$resource,
function($data, \ResourceWrapper $wrapper) {
// check for content after removing leading white space
if (ltrim($data) !== '') {
// found content, remove filter
$wrapper->removeFilter();
// return data without leading whitespace
return ltrim($data);
}
return '';
}
);
libxml_set_streams_context($context);
$reader->open($uri);
while ($foundNode = $reader->read()) {
var_dump($reader->localName);
}
Ouput:
string(6) "person"
string(4) "name"
string(5) "#text"
string(4) "name"
string(6) "person"
Upvotes: 2
Reputation: 57121
Not ideal, but this will just read the source and ltrim()
the first part of the content and write it to a temporary file, you should then be able to read the file called $tmpFile
...
$tmpFile = tempnam(".", "trx");
$fpIn = fopen($linkToExternalFeed,"r");
$fpOut = fopen($tmpFile, "w");
$buffer = fread($fpIn, 4096);
fwrite($fpOut, ltrim($buffer));
while ( $buffer = fread($fpIn, 4096)) {
fwrite($fpOut, $buffer);
}
fclose($fpIn);
fclose($fpOut);
I use tmpname()
to generate a unique file name, you could set this to anything which you feel happy with. It may also be useful to delete this file once you've processed it to save space and remove potentially sensitive information.
Upvotes: 0