Reputation: 1764
UPDATE: I've reworked the question, to show progress I've made, and maybe make it easier to answer.
UPDATE 2: I've added another value to the XML. Extension available in each zip. Each item can have multiple items separated by a tab. So it will be structured like this. Platform > Extension (Sub Group) > Name > Title. If the item has more than one extension then it will appear in multiple places.
I have the following XML file.
<Item>
<Platform>Windows</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 1</Name>
<Title>This is the first file group</Title>
<DownloadPath>/this/windows/1/1.zip</DownloadPath>
</Item>
<Item>
<Platform>Windows</Platform>
<Ext>gif doc</Ext>
<Name>File Group 1</Name>
<Title>This is the first file group</Title>
<DownloadPath>/this/windows/1/2.zip</DownloadPath>
</Item>
<Item>
<Platform>Windows</Platform>
<Ext>gif</Ext>
<Name>File Group 1</Name>
<Title>This is in the same group but has a different title</Title>
<DownloadPath>/this/windows/1/3.zip</DownloadPath>
</Item>
<Item>
<Platform>Mac</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 1</Name>
<Title>This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.</Title>
<DownloadPath>/this/mac/1/1.zip</DownloadPath>
</Item>
<Item>
<Platform>Mac</Platform>
<Ext>jpeg doc</Ext>
<Name>File Group 1</Name>
<Title>This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.</Title>
<DownloadPath>/this/mac/1/2.zip</DownloadPath>
</Item>
<Item>
<Platform>Windows</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 2</Name>
<Title>This is the second file group</Title>
<DownloadPath>/this/windows/2/1.zip</DownloadPath>
</Item>
<Item>
<Platform>Windows</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 2</Name>
<Title>This is the second file group</Title>
<DownloadPath>/this/windows/2/2.zip</DownloadPath>
</Item>
<Item>
<Platform>Mac</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 3</Name>
<Title>This is the second mac file group really.</Title>
<DownloadPath>/this/windows/3/1.zip</DownloadPath>
</Item>
I want to be able to go through it and sort it so I can insert it into a normalized table schema. Here is the format I would like the array to built.
[Windows] => Array (
[0] => array(
"Name" => "File Group 1",
"Title" => "This is the first file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/windows/1/2.zip"
)
)
),
[1] => array(
"Name" => "File Group 1",
"Title" => "This has the same name but has a different title, so it should be seperate.",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/1/3.zip"
)
)
),
[1] => array(
"Name" => "File Group 2",
"Title" => "This is the second file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/2/1.zip"
),
[1] => array(
"DownloadPath" => "/this/windows/2/2.zip"
)
)
)
),
[Mac] => Array(
[0] => array(
"Name" => "File Group 1",
"Title" => "This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/mac/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/mac/1/2.zip"
)
)
),
[1] => array(
"Name" => "File Group 3",
"Title" => "This is the second mac file group really.",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/mac/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/mac/1/2.zip"
)
)
),
)
Here is what I've got so far with my php
$scrape_xml = "files.xml";
$xml = simplexml_load_file($scrape_xml);
$groups = array();
foreach ($xml->Item as $file){
if (!isset($groups[stripslashes($file->Platform)][stripslashes($file->Name)][stripslashes($file->Title)])){
$groups[stripslashes($file->Platform)][stripslashes($file->Name)][stripslashes($file->Title)] = array(
'Platform' => $file->Platform,
'Name' => $file->Name,
'Title' => $file->Title
);
}
$groups[stripslashes($file->Platform)][stripslashes($file->Name)][stripslashes($file->Title)]['Files'][] = $file->DownloadPath;
}
echo "count=" . $i;
echo "<pre>";
print_r($groups);
echo "</pre>";
it gives me this result
Array
(
[Windows] => Array
(
[File Group 1] => Array
(
[This is the first file group] => Array
(
[Platform] => SimpleXMLElement Object
(
[0] => Windows
)
[Name] => SimpleXMLElement Object
(
[0] => File Group 1
)
[Title] => SimpleXMLElement Object
(
[0] => This is the first file group
)
[Files] => Array
(
[0] => SimpleXMLElement Object
(
[0] => /this/windows/1/1.zip
)
[1] => SimpleXMLElement Object
(
[0] => /this/windows/1/2.zip
)
)
)
[This is in the same group but has a different title] => Array
(
[Platform] => SimpleXMLElement Object
(
[0] => Windows
)
[Name] => SimpleXMLElement Object
(
[0] => File Group 1
)
[Title] => SimpleXMLElement Object
(
[0] => This is in the same group but has a different title
)
[Files] => Array
(
[0] => SimpleXMLElement Object
(
[0] => /this/windows/1/3.zip
)
)
)
)
[File Group 2] => Array
(
[This is the second file group] => Array
(
[Platform] => SimpleXMLElement Object
(
[0] => Windows
)
[Name] => SimpleXMLElement Object
(
[0] => File Group 2
)
[Title] => SimpleXMLElement Object
(
[0] => This is the second file group
)
[Files] => Array
(
[0] => SimpleXMLElement Object
(
[0] => /this/windows/2/1.zip
)
[1] => SimpleXMLElement Object
(
[0] => /this/windows/2/2.zip
)
)
)
)
)
[Mac] => Array
(
[File Group 1] => Array
(
[This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.] => Array
(
[Platform] => SimpleXMLElement Object
(
[0] => Mac
)
[Name] => SimpleXMLElement Object
(
[0] => File Group 1
)
[Title] => SimpleXMLElement Object
(
[0] => This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.
)
[Files] => Array
(
[0] => SimpleXMLElement Object
(
[0] => /this/mac/1/1.zip
)
[1] => SimpleXMLElement Object
(
[0] => /this/mac/1/2.zip
)
)
)
)
[File Group 3] => Array
(
[This is the second mac file group really.] => Array
(
[Platform] => SimpleXMLElement Object
(
[0] => Mac
)
[Name] => SimpleXMLElement Object
(
[0] => File Group 3
)
[Title] => SimpleXMLElement Object
(
[0] => This is the second mac file group really.
)
[Files] => Array
(
[0] => SimpleXMLElement Object
(
[0] => /this/windows/3/1.zip
)
)
)
)
)
)
UPDATE 2: New Array Structure
[Windows] => Array (
[gif] =>Array(
[0] => array(
"Name" => "File Group 1",
"Title" => "This is the first file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/windows/1/2.zip"
)
)
)
),
[jpeg] => array(
[0] => array(
"Name" => "File Group 1",
"Title" => "This is the first file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/windows/1/2.zip"
)
)
),
[1] => array(
"Name" => "File Group 2",
"Title" => "This is the second file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/2/1.zip"
),
[1] => array(
"DownloadPath" => "/this/windows/2/2.zip"
)
)
)
),
[doc] => array(
[0] => array(
"Name" => "File Group 1",
"Title" => "This is the first file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/windows/1/2.zip"
)
)
),
[1] => array(
"Name" => "File Group 1",
"Title" => "This has the same name but has a different title, so it should be seperate.",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/1/3.zip"
)
)
),
[2] => array(
"Name" => "File Group 2",
"Title" => "This is the second file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/windows/2/1.zip"
),
[1] => array(
"DownloadPath" => "/this/windows/2/2.zip"
)
)
)
)
),
[Mac] => Array(
[gif] => array(
[0] => array(
"Name" => "File Group 2",
"Title" => "This is the second file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/mac/2/1.zip"
),
[1] => array(
"DownloadPath" => "/this/mac/2/2.zip"
)
)
),
[1] => array(
"Name" => "File Group 2",
"Title" => "This is the second file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/mac/2/1.zip"
),
[1] => array(
"DownloadPath" => "/this/mac/2/2.zip"
)
)
),
)
[jepg] => array(
[0] => array(
"Name" => "File Group 2",
"Title" => "This is the second file group",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/mac/2/1.zip"
),
[1] => array(
"DownloadPath" => "/this/mac/2/2.zip"
)
)
)
)
[doc] => array(
[0] => array(
"Name" => "File Group 1",
"Title" => "This has the same group name but a different platform. Because it has the same title and name the files are added to this array below.",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/mac/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/mac/1/2.zip"
)
)
),
[1] => array(
"Name" => "File Group 3",
"Title" => "This is the second mac file group really.",
"Files" => array(
[0] => array(
"DownloadPath" => "/this/mac/1/1.zip"
),
[1] => array(
"DownloadPath" => "/this/mac/1/2.zip"
)
)
)
)
)
UPDATE 3: There is some garbage coming through for the file list.
<Item>
<Platform>Windows</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 1</Name>
<Title>This is the first file group</Title>
<DownloadPath>/this/windows/1/1.zip</DownloadPath>
</Item>
<Item>
<Platform>Windows</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 1</Name>
<Title>This is the first file group</Title>
<DownloadPath>/this/windows/1/2.zip</DownloadPath>
</Item>
<Item>
<Platform>Windows</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 1</Name>
<Title>This is the first file group</Title>
<DownloadPath>/this/windows/2/1.zip</DownloadPath>
</Item>
<Item>
<Platform>Windows</Platform>
<Ext>gif jpeg doc</Ext>
<Name>File Group 1</Name>
<Title>This is the first file group</Title>
<DownloadPath>/this/windows/2/2.zip</DownloadPath>
</Item>
There is a item with the same platform, extensions, name and title. Items 3 and 4 above need to be skipped over and save them to an array that I will handle later.
Upvotes: 1
Views: 209
Reputation: 197544
You are merely mapping the input values into the output array by arranging them differently, this is your structure:
Array(
[... Item/Platform] => Array (
[... Item/Title as 0-n] => array(
"Name" => Item/Name,
"Title" => Item/Title,
"Files" => array(
[...] => array(
"DownloadPath" => Item/DownloadPath
),
)
),
The mapping can be done by iterating over the items within the XML and storing the values into the appropriate place in the new array (I named it $build
):
$build = array();
foreach($items as $item)
{
$platform = (string) $item->Platform;
$title = (string) $item->Title;
isset($build[$platform][$title]) ?: $build[$platform][$title] = array(
'Name' => (string) $item->Name,
'Title' => $title
);
$build[$platform][$title]['Files'][] = array('DownloadPath' => (string) $item->DownloadPath);
}
$build = array_map('array_values', $build);
The array_map
call is done at the end to convert the Item/Title
keys into numerical ones.
And that's it, here the Demo.
Let me know if that's helpful.
Edit: For your updated data, it's a slight modification of the above, the key principles of the previous example still exist, it's additionally taken care of the extra duplication per each additional extension per item, by adding another iteration inside:
$build = array();
foreach($items as $item)
{
$platform = (string) $item->Platform;
$title = (string) $item->Title;
foreach(preg_split("~\s+~", $item->Ext) as $ext)
{
isset($build[$platform][$ext][$title])
?:$build[$platform][$ext][$title] = array(
'Name' => (string) $item->Name,
'Title' => $title
);
$build[$platform][$ext][$title]['Files'][]
= array('DownloadPath' => (string) $item->DownloadPath);
}
}
$build = array_map(function($v) {return array_map('array_values', $v);}, $build);
Upvotes: 1
Reputation: 7896
This is the code that will give you the result you need. UPDATE: This concerns the latest grouping you asked for.
$scrape_xml = "files.xml";
$xml = simplexml_load_file($scrape_xml);
$groups = array();
foreach ($xml->Item as $file){
$platform = stripslashes($file->Platform);
$name = stripslashes($file->Name);
$title = stripslashes($file->Title);
$extensions = explode(' ', $file->Ext);
foreach($extensions as $extension)
{
if (!isset($groups2[$platform])) $groups2[$platform] = array();
if (!isset($groups2[$platform][$extension])) $groups2[$platform][$extension] = array();
$groupFound = false;
for($idx = 0; $idx < count($groups2[$platform][$extension]); $idx ++) {
if ($groups2[$platform][$extension][$idx]["Name"] == $name
&& $groups2[$platform][$extension][$idx]["Title"] == $title) {
$groups2[$platform][$extension][$idx]["Files"][] =
array('DownloadPath' => $file->DownloadPath."");
$groupFound = true;
break;
}
}
if ($groupFound) continue;
$groups2[$platform][$extension][] =
array(
"Name" => $name,
"Title" => $title,
"Files" => array(array('DownloadPath' => $file->DownloadPath."")));
}
}
echo "<br />";
echo "<pre>";
print_r($groups2);
echo "</pre>";
Upvotes: 0
Reputation: 3373
You can try this:
$scrape_xml = "files.xml";
$xml = simplexml_load_file($scrape_xml);
$group = array();
foreach ($xml->Item as $file)
{
$platform = stripslashes($file->Platform);
$name = stripslashes($file->Name);
$title = stripslashes($file->Title);
$downloadPath = stripslashes($file->DownloadPath);
if(!isset($group[$platform]))
{
$group[$platform] = array();
$group[$platform][] = array("Name" => $name,"Title" => $title, "Files" => array($downloadPath));
}
else
{
$found = false;
for($i=0;$i<count($group[$platform]);$i++)
{
if($group[$platform][$i]["Name"] == $name && $group[$platform][$i]["Title"] == $title)
{
$group[$platform][$i]["Files"][] = $downloadPath;
$found = true;
break;
}
}
if(!$found)
{
$group[$platform][] = array("Name" => $name,"Title" => $title, "Files" => array($downloadPath));
}
}
}
echo "<pre>".print_r($group,true)."</pre>";
Upvotes: 0
Reputation: 596
How's something like this? Code is a bit sloppy, and tweaks should probably be made to improve the validation.
class XMLFileImporter {
public $file; //Absolute path to import file
public $import = array();
public $xml;
public $error = false;
public function __construct($file) {
$this->file = $file;
$this->load();
}
public function load() {
if(!is_readable($this->file)) {
$this->error("File is not readable");
return false;
}
$xml = simplexml_load_file($this->file);
if(!$xml) {
$this->error("XML could not be parsed");
return false;
}
$this->xml = json_decode(json_encode($xml));
return true;
}
public function import() {
$count = $this->parseItems();
echo "Imported $count rows";
}
public function parseItems() {
if($this->error()){
return false;
}
if(!self::validateXML($this->xml)) {
$this->error("Invalid SimpleXML object");
return false;
}
if(!self::validateArray($this->xml->Item)) {
$this->error("Invalid Array 'Item' on SimpleXML object");
return false;
}
$count = 0;
foreach($this->xml->Item as $item) {
if($this->parseItem($item)){
$count++;
}
}
return $count;
}
public function parseItem($item) {
if($this->error()){
return false;
}
if(!self::validateItem($item)) {
$this->error("Invalid file item");
return false;
}
$item = self::normalizeItem($item);
$this->handlePlatform((string)$item->Platform);
$this->handleGroup($item);
$this->handleSubGroup($item);
$this->handleFile($item);
return true;
}
public function handlePlatform($platform) {
if(!isset($this->import[$platform])) {
$this->import[$platform] = array();
}
return true;
}
public function handleGroup($item) {
if(!isset($this->import[$item->Platform][$item->Name])) {
$this->import[$item->Platform][$item->Name] = array();
}
return true;
}
public function handleSubGroup($item) {
if(!isset($this->import[$item->Platform][$item->Name][$item->Title])) {
$this->import[$item->Platform][$item->Name][$item->Title] = array();
}
return true;
}
public function handleFile($item) {
array_push($this->import[$item->Platform][$item->Name][$item->Title],$item->DownloadPath);
}
public function error($set=false) {
if($set){
$this->error = $set;
return true;
}
return $this->error;
}
public static function validateXML($xml) {
return is_object($xml);
}
public static function validateArray($arr,$min=1){
return (isset($arr) && !empty($arr) && count($arr) > $min);
}
public static function validateItem($item){
return (isset($item->Title)
&& isset($item->Name)
&& isset($item->DownloadPath)
&& isset($item->Platform));
}
public static function normalizeItem($item){
$item->Name = stripslashes(trim((string)$item->Name));
$item->Title = stripslashes(trim((string)$item->Title));
$item->Platform = (string)$item->Platform;
$item->DownloadPath = (string)$item->DownloadPath;
return $item;
}
public function output() {
print_r($this->import);
return true;
}
}
$importer = new XMLFileImporter(dirname(__FILE__)."/files.xml");
$importer->load();
$importer->import();
$importer->output();
var_dump($importer->error());
Upvotes: 0
Reputation: 6573
I prefer DOM DOcument and XPath myself so his is what I'd do...
$xml = '\path\to\your\file.xml';
$doc = new DOMDocument( '1.0', 'UTF-8' );
$doc->load( $xml );
$dxpath = new DOMXPath( $doc );
$items = $dxpath->query( '//Item' );
$db = new PDO( 'mysql:dbname=YOURDB:host=YOURHOST', $DBUSER, $DBPASS );
$ins = $db->prepare('
INSERT INTO ur_table
( `platform` , `name` , `title` , `path` )
VALUES
( :platform , :name , :title , :path );
');
foreach( $items as $item )
{
$ins->bindValue( ':platform' , $item->getElementsByTagName( 'PlatForm' )->item(0)->nodeValue , PDO::PARAM_STR );
$ins->bindValue( ':name' , $item->getElementsByTagName( 'Name' )->item(0)->nodeValue , PDO::PARAM_STR );
$ins->bindValue( ':title' , $item->getElementsByTagName( 'Title' )->item(0)->nodeValue , PDO::PARAM_STR );
$ins->bindValue( ':DownloadPath' , $item->getElementsByTagName( 'PlatForm' )->item(0)->nodeValue , PDO::PARAM_STR );
$ins->execute();
}
No need for stripslashes and what not - it will handle all taht for you.
Upvotes: 0
Reputation: 324600
You haven't explained what you're seeing wrong, exactly, so I'm going to have to guess.
First, in your source, your last DownloadPath is /this/windows/3/1.zip
even though it's supposed to be a Mac file - mis-type, I'm sure, but the output will "look wrong" with that there.
Next, if you want strings rather than SimpleXMLElement Objects, you need this (also done some tidying to avoid so many stripslashes()
calls):
foreach ($xml->Item as $file) {
$platform = stripslashes((string) $file->Platform);
$name = stripslashes((string) $file->Name);
$title = stripslashes((string) $file->Title);
if( !isset($groups[$platform][$name][$title])) {
$groups[$platform][$name][$title] = array(
'Platform' => $platform,
'Name' => $name,
'Title' => $title
);
}
$groups[$platform][$name][$title]['Files'][] = (string) $file->DownloadPath;
}
Notice the (string)
bits? They cast the object to a string, which allows you access to the literal value rather than the object. This is also the reason why your array keys worked, because they were internally cast to strings (only strings and integer may be used as array keys).
I think that's all I can find that might answer your question. If it isn't please let me know more clearly what's wrong and I'll be happy to try and help.
Upvotes: 0
Reputation: 10469
start by declaring
$groups[stripslashes($file->Platform)][stripslashes($file->Name)]
[stripslashes($file->Title)] = (object)array(
'Name' => $file->Name,
'Title' => $file->Title,
'Files' = (object)array()
);
This will get you closer.
You should also check the type of each XMLElement as you get it to see if its an array or a simple object. Then treat accordingly.
Upvotes: 0