Edvard Åkerberg
Edvard Åkerberg

Reputation: 2191

php reading RSS feed cannot read <a10:content type="text/xml"> tag

I'm trying to read an RSS feed using php. For some reason it cannot read this content tag.

<a10:content type="text/xml">...</a10:content>

This is an example of what an item could look like

<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
    <channel>
        <title>mMin title</title>
        <description>Some description</description>
        <managingEditor>[email protected]</managingEditor>
        <category>Some category</category>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/1</link>
            <title>Some title 1</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>San diego</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/2</link>
            <title>Some title 2</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Detroit</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/3</link>
            <title>Some title 3</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Los Angeles</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
    </channel>
</rss>

Here is my code.

    $url = "http://example.com/RSSFeed";
    $xml = simplexml_load_file($url);

    foreach ($xml->channel as $x) {
        foreach ($x->item as $item) {

            dd($item);
        }
    }

Which outputs

    SimpleXMLElement {#111 ▼
      +"guid": "1"
      +"link": "https://example.com"
      +"title": "Some title"
    }

Here is my expected output

SimpleXMLElement {#111 ▼
  +"guid": "1"
  +"link": "https://example.com"
  +"title": "Some title"
  +"content" {
    0 => {
        +"Location": "San Diego"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
    1 => {
        +"Location": "Detroit"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
    2 => {
        +"Location": "Los Angeles"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
  }
}

Anyone has a solution for this?

Upvotes: 2

Views: 869

Answers (3)

Edvard &#197;kerberg
Edvard &#197;kerberg

Reputation: 2191

Here is my working solution

$xml = file_get_contents("https://example.com/RSSFeed");

$string = str_replace(array("<a10:content","</a10:content>"), array("<content","</content>"), $xml);

$sxe = new \SimpleXMLElement($string);

$jobs = array();

foreach ($sxe as $item) {

     dd($item);

}

Upvotes: 2

delboy1978uk
delboy1978uk

Reputation: 12365

Firstly, don't use simple xml, it's crap! You are much better using DOMDocument.

http://php.net/manual/en/class.domdocument.php

<?php

$dom = new DOMDocument();
$dom->loadXML($xml);


$items = $dom->getElementsByTagName('item');
$array = array();

foreach($items as $item)
{
    $title = $item->getElementsByTagName('title')->item(0)->nodeValue;
    $link = $item->getElementsByTagName('link')->item(0)->nodeValue;
    $updated = $item->getElementsByTagName('updated')->item(0)->nodeValue;
    $location = $item->getElementsByTagName('Location')->item(0)->nodeValue;
    $pub = $item->getElementsByTagName('PublishedOn')->item(0)->nodeValue;
    $body = $item->getElementsByTagName('Body')->item(0)->nodeValue;
    $job = $item->getElementsByTagName('JobCountry')->item(0)->nodeValue;

    $array[] = [
        'title' => $title,
        'link' => $link, 
        'updated' => $updated, 
        'Location' => $location, 
        'PublishedOn' => $pub, 
        'Body' => $body, 
        'JobCountry' => $job, 
    ];
}

var_dump($array);

Which will gvie ytou this:

array(7) { ["title"]=> string(12) "Some title 1" ["link"]=> string(21) "https://example.com/1" ["updated"]=> string(25) "2017-05-30T13:20:22+02:00" ["Location"]=> string(9) "San diego" ["PublishedOn"]=> string(19) "2016-10-21T11:21:07" ["Body"]=> string(17) "Lorem ipsum dolar" ["JobCountry"]=> string(3) "USA" }

See here! https://3v4l.org/E0UXJ

Now it works, lets optimise it by creating a convenience function:

function domToArray($item, array $cols)
{
    $array = [];
    foreach ($cols as $col) {
        $val = $item->getElementsByTagName($col)->item(0)->nodeValue;
        $array[$col] = $val;
    }
    return $array;
}

$dom = new DOMDocument();
$dom->loadXML($xml);

$items = $dom->getElementsByTagName('item');
$array = array();

$fields = [
        'title',
        'link', 
        'updated', 
        'Location', 
        'PublishedOn', 
        'Body', 
        'JobCountry', 
    ];

foreach($items as $item)
{
    $array[] = domToArray($item, $fields);
}

var_dump($array);

Same output, see here https://3v4l.org/W6HM3

Upvotes: 0

Sahil Gulati
Sahil Gulati

Reputation: 15141

You should use namespace for accessing. Here we are using DOMDocument to achieve desired output. DOMDocument function getElementsByTagNameNS, in this we pass namespace uri and its required content. so that expected output can be achieved.

If you prefer to use simplexml_load_string you can check this out. PHP code demo

Try this code snippet here

<?php

ini_set('display_errors', 1);

libxml_use_internal_errors(true);   
$string=<<<HTML
<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
    <channel>
        <title>mMin title</title>
        <description>Some description</description>
        <managingEditor>[email protected]</managingEditor>
        <category>Some category</category>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com</link>
            <title>Some title</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Detroit</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
    </channel>
</rss>
HTML;
$data=array();
$completeData=array();
$domDocument = new DOMDocument();
$domDocument->loadXML($string);
$results=$domDocument->getElementsByTagNameNS("http://www.w3.org/2005/Atom", "content");
foreach($results as $result)
{
    if($result instanceof DOMElement && $result->tagName=="a10:content")
    {
        foreach($result->childNodes as $node)
        {
            if($node instanceof DOMElement)
            {
                $data[]=$node->nodeValue;
            }
        }
    }
    $completeData[]=$data;
}
print_r($completeData);

Upvotes: 1

Related Questions