Reputation: 135
I've made a simple tool that lets you fill in an input field with a URL for an XML file. It's supposed to show all the nodes so the user can match them with database fields, which I have working for an XML file that has 2 "primary" nodes. Example of the XML file:
<foods>
<food>
<name>ravioli</name>
<recipe>food.com/ravioli</recipe>
<time>10 minutes</time>
</food>
<food>
<name>ravioli</name>
<recipe>food.com/ravioli</recipe>
<time>10 minutes</time>
</food>
</foods>
This returns me a list that says
name
recipe
time
The problem is when someone wants to use an XML file that doesn't have 2 "primary" nodes. For example it's missing the <food>
node. In this case it wouldn't be able to show the result because my PHP code is expecting 2 instead of 1 primary.
My code is as follows:
// Fetch the XML from the URL
if (!$xml = simplexml_load_file($_GET['url'])) {
// The XML file could not be reached
echo 'Error loading XML. Please check the URL.';
} else {
// Parse through the XML and fetch the nodes
$child = $xml->children();
foreach($child->children() as $key => $value) {
echo $key."<br>";
}
}
Is there a way to get the nodes I want from any XML file, regardless of the amount of parent nodes?
Upvotes: 1
Views: 1591
Reputation: 19482
You can query data from an XML DOM, using Xpath. It is accessible in PHP using the DOMXpath::evaluate() method. The second argument is the context, so you're expressions can be relative to another node. Converting it to an list of records (for database, csv, ...). will require several steps. Starting with some bootstrap:
$xml = <<<'XML'
<foods>
<food>
<name>ravioli 1</name>
<recipe>food.com/ravioli-1</recipe>
<time unit="minutes">10</time>
</food>
<food>
<name>ravioli 2</name>
<recipe>food.com/ravioli-2</recipe>
<time unit="minutes">11</time>
</food>
</foods>
XML;
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
First we need to define which xml element defines the record, then which elements define the fields.
So let's build a lists of possible record paths and field paths:
$paths = [];
$leafs = [];
foreach ($xpath->evaluate('//*|//@*') as $node) {
$isPath = $xpath->evaluate('count(@*|*) > 0', $node);
$isLeaf = !($xpath->evaluate('count(*) > 0', $node));
$path = '';
foreach ($xpath->evaluate('ancestor::*', $node) as $parent) {
$path .= '/'.$parent->nodeName;
}
$path .= '/'.($node instanceOf DOMAttr ? '@' : '').$node->nodeName;
if ($isLeaf) {
$leafs[$path] = TRUE;
}
if ($isPath) {
$paths[$path] = TRUE;
}
}
$paths = array_keys($paths);
$leafs = array_keys($leafs);
var_dump($paths, $leafs);
Output:
array(3) {
[0] =>
string(6) "/foods"
[1] =>
string(11) "/foods/food"
[2] =>
string(16) "/foods/food/time"
}
array(4) {
[0] =>
string(16) "/foods/food/name"
[1] =>
string(18) "/foods/food/recipe"
[2] =>
string(16) "/foods/food/time"
[3] =>
string(22) "/foods/food/time/@unit"
}
Next show the possible record paths to the user. The user needs to select one. Knowing the record path, build a list of the possible field paths from the leafs array:
$path = '/foods/food';
$fieldLeafs = [];
$pathLength = strlen($path) + 1;
foreach ($leafs as $leaf) {
if (0 === strpos($leaf, $path.'/')) {
$fieldLeafs[] = substr($leaf, $pathLength);
}
}
var_dump($fieldLeafs);
Output:
array(4) {
[0] =>
string(4) "name"
[1] =>
string(6) "recipe"
[2] =>
string(4) "time"
[3] =>
string(10) "time/@unit"
}
Put up some dialog that allows the user to select a path for each field.
$fieldDefinition = [
'title' => 'name',
'url' => 'recipe',
'needed_time' => 'time',
'time_unit' => 'time/@unit'
];
Now use the path and the mapping to build up the records array:
$result = [];
foreach ($xpath->evaluate($path) as $node) {
$record = [];
foreach ($fieldDefinition as $field => $expression) {
$record[$field] = $xpath->evaluate(
'string('.$expression.')',
$node
);
}
$result[] = $record;
}
var_dump($result);
Output:
array(2) {
[0] =>
array(4) {
'title' =>
string(9) "ravioli 1"
'url' =>
string(18) "food.com/ravioli-1"
'needed_time' =>
string(2) "10"
'time_unit' =>
string(7) "minutes"
}
[1] =>
array(4) {
'title' =>
string(9) "ravioli 2"
'url' =>
string(18) "food.com/ravioli-2"
'needed_time' =>
string(2) "11"
'time_unit' =>
string(7) "minutes"
}
}
The full example can be found at: https://eval.in/118012
The XML in the example is never converted to a generic array. Doing this would mean to loosing information and double storage. So don't. Extract structure information from the XML, let the user define the mapping. Use Xpath extract the data and store them directly in the result format.
Upvotes: 2