Reputation: 128
I work with xml file that looks like this:
<text>
<paragraph/>
First text
<paragraph/>
Second text
</text>
<text>
<paragraph/>
Third text
<paragraph/>
Fourth text
</text>
I need to get the value of text element but the result should be in 4 rows. So every <paragraph/>
element starts new row:
1 | First text
2 | Second text
3 | Third text
4 | Fourth text
My code:
$filexml = File::get('../file.xml');
$xml = simplexml_load_string($filexml);
for ($i=1; $i < count($xml->text) + 1; $i++) {
foreach ($xml->text as $text_item) {
echo $i++." | ".$text_item."<br/>";
}
}
My result:
1 | First text Second text
2 | Third text Fourth text
What should I do next? Or maybe there is different approach how can I achieve the desired result?
Upvotes: 0
Views: 77
Reputation: 19492
SimpleXML does not work well with mixed child nodes. You will need to use DOM for that. You can use an Xpath expression to fetch the nodes (texts are nodes, too).
//text/*|//text/text()[normalize-space(.) != ""]
filters for any child element node or any text node (this includes cdata sections) inside a text
element. It will ignore text nodes that contain only whitespaces.
The result is a list of nodes that you can iterate with foreach. Check if it a separator (a paragraph
element node). If yes, output the buffer otherwise add the text content of the node to the buffer.
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$buffer = '';
$counter = 0;
foreach ($xpath->evaluate('//text/*|//text/text()[normalize-space(.) != ""]') as $node) {
if ($node instanceof DOMElement && $node->localName === 'paragraph') {
if ($buffer !== '') {
echo ++$counter, ' | ', trim($buffer), "\n";
$buffer = '';
}
} else {
$buffer .= $node->textContent;
}
}
if ($buffer !== '') {
echo ++$counter, ' | ', trim($buffer), "\n";
}
Output:
1 | First text
2 | Second text
3 | Third text
4 | Fourth text
Upvotes: 0
Reputation: 163277
You could use DOMDocument and DOMXPath. In the expression you could get the text nodes using text().
Then you could loop those and check for empty strings.
$filexml = File::get('../file.xml');
$doc = new DOMDocument();
$doc->loadXML($filexml);
$xpath = new DOMXpath($doc);
$i = 1;
$expression = "//text/text()";
foreach ($xpath->query($expression) as $text) {
$result = trim($text->nodeValue);
if ($result !== "") {
echo sprintf("%d | %s<br>", $i++, $result);
}
}
Upvotes: 1
Reputation: 6393
Okay, this isn't particularly pretty, and I suggest you still give this a try using XPath, but here goes...
<?php
$filexml = "<root>
<text>
<paragraph/>
First text
<paragraph/>
Second text
</text>
<text>
<paragraph/>
Third text
<paragraph/>
Fourth text
</text>
</root>";
$xml = simplexml_load_string($filexml);
$i=1;
foreach($xml->text as $textNode)
{
$textCounter = 1;
foreach ($textNode->paragraph as $text_item) {
echo $i++." | ".trim(explode(PHP_EOL.PHP_EOL, (string)$textNode)[$textCounter++])."<br/>";
}
}
?>
You were basically on the right track, but your inner loop needs to iterate over the paragraph
nodes, not the text
nodes again. You also then need to be able to split apart the text within the text
nodes. If the file really does have everything on individual lines, then you're fine, as you can split on newlines. If it doesn't (everything on one line), then this won't work.
Upvotes: 0
Reputation: 108
Try to change this:
<text>
<paragraph/>
First text
<paragraph/>
Second text
</text>
<text>
<paragraph/>
Third text
<paragraph/>
Fourth text
</text>
for this:
<text>
<paragraph/>
First text
<paragraph/>
</text>
<text>
<paragraph/>
Two text
<paragraph/>
</text>
<text>
<paragraph/>
Three text
<paragraph/>
</text>
<text>
<paragraph/>
Four text
<paragraph/>
</text>
Upvotes: 0