sunergos
sunergos

Reputation: 128

Getting XML content within one element and spliting result by others subelements

I work with xml file that looks like this:

<text>
  <paragraph/>
    First text
  <paragraph/>
    Second text
</text>
<text>
  <paragraph/>
    Third text
  <paragraph/>
    Fourth text
</text>

I need to get the value of text element but the result should be in 4 rows. So every <paragraph/> element starts new row:

1 | First text
2 | Second text
3 | Third text
4 | Fourth text

My code:

$filexml = File::get('../file.xml');

$xml = simplexml_load_string($filexml);

for ($i=1; $i < count($xml->text) + 1; $i++) {

    foreach ($xml->text as $text_item) {
        echo $i++." | ".$text_item."<br/>";
    }

}

My result:

1 | First text Second text
2 | Third text Fourth text

What should I do next? Or maybe there is different approach how can I achieve the desired result?

Upvotes: 0

Views: 77

Answers (4)

ThW
ThW

Reputation: 19492

SimpleXML does not work well with mixed child nodes. You will need to use DOM for that. You can use an Xpath expression to fetch the nodes (texts are nodes, too).

//text/*|//text/text()[normalize-space(.) != ""] filters for any child element node or any text node (this includes cdata sections) inside a text element. It will ignore text nodes that contain only whitespaces.

The result is a list of nodes that you can iterate with foreach. Check if it a separator (a paragraph element node). If yes, output the buffer otherwise add the text content of the node to the buffer.

$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);

$buffer = '';
$counter = 0;
foreach ($xpath->evaluate('//text/*|//text/text()[normalize-space(.) != ""]') as $node) {
  if ($node instanceof DOMElement && $node->localName === 'paragraph') {
    if ($buffer !== '') {
      echo ++$counter, ' | ', trim($buffer), "\n";
      $buffer = '';
    }
  } else {
    $buffer .= $node->textContent;
  }
}
if ($buffer !== '') {
  echo ++$counter, ' | ', trim($buffer), "\n";
}

Output:

1 | First text
2 | Second text
3 | Third text
4 | Fourth text

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163277

You could use DOMDocument and DOMXPath. In the expression you could get the text nodes using text().

Then you could loop those and check for empty strings.

$filexml = File::get('../file.xml');
$doc = new DOMDocument();
$doc->loadXML($filexml);
$xpath = new DOMXpath($doc);
$i = 1;
$expression = "//text/text()";
foreach ($xpath->query($expression) as $text) {
    $result = trim($text->nodeValue);
    if ($result !== "") {
        echo sprintf("%d | %s<br>", $i++, $result);
    }
}

Demo

Upvotes: 1

Patrick Q
Patrick Q

Reputation: 6393

Okay, this isn't particularly pretty, and I suggest you still give this a try using XPath, but here goes...

<?php

$filexml = "<root>
<text>
<paragraph/>
First text
<paragraph/>
Second text
</text>
<text>
<paragraph/>
Third text
<paragraph/>
Fourth text
</text>
</root>";

$xml = simplexml_load_string($filexml);
$i=1;

foreach($xml->text as $textNode)
{
    $textCounter = 1;
    foreach ($textNode->paragraph as $text_item) {
        echo $i++." | ".trim(explode(PHP_EOL.PHP_EOL, (string)$textNode)[$textCounter++])."<br/>";
    }
}


?>

You were basically on the right track, but your inner loop needs to iterate over the paragraph nodes, not the text nodes again. You also then need to be able to split apart the text within the text nodes. If the file really does have everything on individual lines, then you're fine, as you can split on newlines. If it doesn't (everything on one line), then this won't work.

Upvotes: 0

nun3z
nun3z

Reputation: 108

Try to change this:

<text>
  <paragraph/>
    First text
  <paragraph/>
    Second text
</text>
<text>
  <paragraph/>
    Third text
  <paragraph/>
    Fourth text
</text>

for this:

<text>
  <paragraph/>
    First text
  <paragraph/>
</text>
<text>
  <paragraph/>
    Two text
  <paragraph/>
</text>
<text>
  <paragraph/>
    Three text
  <paragraph/>
</text>
<text>
  <paragraph/>
    Four text
  <paragraph/>
</text>

Upvotes: 0

Related Questions