Reputation: 2191
I'm trying to read an RSS feed using php. For some reason it cannot read this content tag.
<a10:content type="text/xml">...</a10:content>
This is an example of what an item could look like
<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
<channel>
<title>mMin title</title>
<description>Some description</description>
<managingEditor>[email protected]</managingEditor>
<category>Some category</category>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com/1</link>
<title>Some title 1</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>San diego</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com/2</link>
<title>Some title 2</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>Detroit</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com/3</link>
<title>Some title 3</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>Los Angeles</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
</channel>
</rss>
Here is my code.
$url = "http://example.com/RSSFeed";
$xml = simplexml_load_file($url);
foreach ($xml->channel as $x) {
foreach ($x->item as $item) {
dd($item);
}
}
Which outputs
SimpleXMLElement {#111 ▼
+"guid": "1"
+"link": "https://example.com"
+"title": "Some title"
}
Here is my expected output
SimpleXMLElement {#111 ▼
+"guid": "1"
+"link": "https://example.com"
+"title": "Some title"
+"content" {
0 => {
+"Location": "San Diego"
+"PublishedOn": "2016-10-21T11:21:07"
+"Body": "Lorem ipsum dolar"
+"JobCountry": "USA"
}
1 => {
+"Location": "Detroit"
+"PublishedOn": "2016-10-21T11:21:07"
+"Body": "Lorem ipsum dolar"
+"JobCountry": "USA"
}
2 => {
+"Location": "Los Angeles"
+"PublishedOn": "2016-10-21T11:21:07"
+"Body": "Lorem ipsum dolar"
+"JobCountry": "USA"
}
}
}
Anyone has a solution for this?
Upvotes: 2
Views: 869
Reputation: 2191
Here is my working solution
$xml = file_get_contents("https://example.com/RSSFeed");
$string = str_replace(array("<a10:content","</a10:content>"), array("<content","</content>"), $xml);
$sxe = new \SimpleXMLElement($string);
$jobs = array();
foreach ($sxe as $item) {
dd($item);
}
Upvotes: 2
Reputation: 12365
Firstly, don't use simple xml, it's crap! You are much better using DOMDocument.
http://php.net/manual/en/class.domdocument.php
<?php
$dom = new DOMDocument();
$dom->loadXML($xml);
$items = $dom->getElementsByTagName('item');
$array = array();
foreach($items as $item)
{
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
$link = $item->getElementsByTagName('link')->item(0)->nodeValue;
$updated = $item->getElementsByTagName('updated')->item(0)->nodeValue;
$location = $item->getElementsByTagName('Location')->item(0)->nodeValue;
$pub = $item->getElementsByTagName('PublishedOn')->item(0)->nodeValue;
$body = $item->getElementsByTagName('Body')->item(0)->nodeValue;
$job = $item->getElementsByTagName('JobCountry')->item(0)->nodeValue;
$array[] = [
'title' => $title,
'link' => $link,
'updated' => $updated,
'Location' => $location,
'PublishedOn' => $pub,
'Body' => $body,
'JobCountry' => $job,
];
}
var_dump($array);
Which will gvie ytou this:
array(7) { ["title"]=> string(12) "Some title 1" ["link"]=> string(21) "https://example.com/1" ["updated"]=> string(25) "2017-05-30T13:20:22+02:00" ["Location"]=> string(9) "San diego" ["PublishedOn"]=> string(19) "2016-10-21T11:21:07" ["Body"]=> string(17) "Lorem ipsum dolar" ["JobCountry"]=> string(3) "USA" }
See here! https://3v4l.org/E0UXJ
Now it works, lets optimise it by creating a convenience function:
function domToArray($item, array $cols)
{
$array = [];
foreach ($cols as $col) {
$val = $item->getElementsByTagName($col)->item(0)->nodeValue;
$array[$col] = $val;
}
return $array;
}
$dom = new DOMDocument();
$dom->loadXML($xml);
$items = $dom->getElementsByTagName('item');
$array = array();
$fields = [
'title',
'link',
'updated',
'Location',
'PublishedOn',
'Body',
'JobCountry',
];
foreach($items as $item)
{
$array[] = domToArray($item, $fields);
}
var_dump($array);
Same output, see here https://3v4l.org/W6HM3
Upvotes: 0
Reputation: 15141
You should use namespace for accessing. Here we are using DOMDocument
to achieve desired output. DOMDocument
function getElementsByTagNameNS
, in this we pass namespace uri
and its required content. so that expected output can be achieved.
If you prefer to use simplexml_load_string
you can check this out. PHP code demo
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$string=<<<HTML
<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
<channel>
<title>mMin title</title>
<description>Some description</description>
<managingEditor>[email protected]</managingEditor>
<category>Some category</category>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com</link>
<title>Some title</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>Detroit</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
</channel>
</rss>
HTML;
$data=array();
$completeData=array();
$domDocument = new DOMDocument();
$domDocument->loadXML($string);
$results=$domDocument->getElementsByTagNameNS("http://www.w3.org/2005/Atom", "content");
foreach($results as $result)
{
if($result instanceof DOMElement && $result->tagName=="a10:content")
{
foreach($result->childNodes as $node)
{
if($node instanceof DOMElement)
{
$data[]=$node->nodeValue;
}
}
}
$completeData[]=$data;
}
print_r($completeData);
Upvotes: 1