NorthGuard
NorthGuard

Reputation: 963

PHP Dealing with missing XML data

If I have three sets of data, say:

<note><from>Me</from><to>someone</to><message>hello</message></note>

<note><from>Me</from><to></to><message>Need milk & eggs</message></note>

<note><from>Me</from><message>Need milk & eggs</message></note>

and I'm using simplexml is there a way to have simple xml check that there's an empty/absent tag automatically?

I would like the output to be:

FROM    TO     MESSAGE
Me    someone    hello
Me    NULL    Need milk & eggs
Me    NULL    Need milk & eggs

Right now I'm doing it manually and I quickly realised that it's going to take a very long time to do it for long xml files.

My current sample code:

$xml = simplexml_load_string($string);
if ($xml->from != "") {$out .= $xml->from."\t"} else {$out .= "NULL\t";}
//repeat for all children, checking by name

Sometimes the order is different as well, there might be a xml with:

<note><message>pick up cd</message><from>me</from></note>

so iterating through the children and checking by index count doesn't work.

The actual xml files I'm working with are thousands of lines each, so I obviously can't just code in every tag.

Upvotes: 3

Views: 1379

Answers (2)

andyb
andyb

Reputation: 43823

You could use the DOMDocument instead. I have created a quick demo that splits the <note> elements into an array using the XML tag names as keys. You could then iterate the resultant array to create your output.

I corrected the invalid XML by replacing the ampersand with the HTML entity equivalent (&amp;).

<?php
    libxml_use_internal_errors(true);
    $xml = <<<XML
<notes>
<note><from>Me</from><to>someone</to><message>hello</message></note>
<note><from>Me</from><to></to><message>Need milk &amp; eggs</message></note>
<note><from>Me</from><message>Need milk &amp; eggs</message></note>
<note><message>pick up cd</message><from>me</from></note>
</notes>
XML;

    function getNotes($nodelist) {
        $notes = array();

        foreach ($nodelist as $node) {
            $noteParts = array();

            foreach ($node->childNodes as $child) {
                $noteParts[$child->tagName] = $child->nodeValue;
            }

            $notes[] = $noteParts;
        }

        return $notes;
    }

    $dom = new DOMDocument();
    $dom->recover = true;
    $dom->loadXML($xml);
    $xpath = new DOMXPath($dom);
    $nodelist = $xpath->query("//note");
    $notes = getNotes($nodelist);

    print_r($notes);
?>

Edit: If you change to $noteParts = array(); to $noteParts = array('from' => null, 'to' => null, 'message' => null); then it will always create the full set of keys.

Upvotes: 1

Spudley
Spudley

Reputation: 168803

It sounds like you need a DTD (Document Type Definition), which will define the required format of the XML file, and specify which elements are required, optional, what they can contain, etc.

DTDs can be used to validate an XML file before you do any processing with it.

Unfortunately, PHP's simplexml library doesn't do anything with DTD, but the DomDocument library does, so you may want to use that instead.

I'll leave it as a separate excersise for you to research how to create a DTD file. If you need more help with that, I'd suggest asking it as a separate question.

Upvotes: 2

Related Questions