Sean McRaghty
Sean McRaghty

Reputation: 295

Simple XML Load File not working

how come this isn't working:

$url = "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20xpath%3D%22%2F%2Fmeta%22%20and%20url%3D%22http://www.cnn.com%22&format=xml&diagnostics=false";

$xml = (simplexml_load_file($url))

I get multiple errors telling me the HTTP request failed. Ultimately I want to get the results from this file into an array eg

Description = CNN.com delivers the latest breaking news etc.

Keywords = CNN, CNN news, CNN.com, CNN TV etc.

But this initial stage isn't working. Any help please?

EDIT Additional information:

Errors:

warning: simplexml_load_file(http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20xpath%3D%22//meta%22%20and%20url%3D%22http://www.cnn.com%22&format=xml&diagnostics=false) [function.simplexml-load-file]: failed to open stream: HTTP request failed!
# warning: simplexml_load_file() [function.simplexml-load-file]: I/O warning : failed to load external entity "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20xpath%3D%22//meta%22%20and%20url%3D%22http://www.cnn.com%22&format=xml&diagnostics=false" 

Upvotes: 2

Views: 13452

Answers (2)

salathe
salathe

Reputation: 51950

(Note: Potentially useless answer once a real answer has been found…)


While you're figuring out the XML problem (keep working on it!) know that you can also get the YQL response back as JSON. Here's a quickie example:

$url = "http://query.yahooapis.com/v1/public/yql?q=select+%2A+"
     . "from+html+where+xpath%3D%22%2F%2Fmeta%5B%40name%3D%27"
     . "Keywords%27+or+%40name%3D%27Description%27%5D%22+and+"
     . "url%3D%22http%3A%2F%2Fwww.cnn.com%22&format=json&diagnostics=false";

// Grab YQL response and parse JSON
$json   = file_get_contents($url);
$result = json_decode($json, TRUE);

// Loop over meta results looking for what we want
$items = $result['query']['results']['meta'];
$metas = array();
foreach ($items as $item) {
    $metas[$item['name']] = $item['content'];
}
print_r($metas);

Giving an array like (text truncated for the screen):

Array
(
    [Description] => CNN.com delivers the latest breaking news and …
    [Keywords] => CNN, CNN news, CNN.com, CNN TV, news, news online …
)

Note that the YQL query (try it in the console) is slightly different to yours, to make the PHP simpler.

Upvotes: 3

Well, the XML is GETable. As for valid, it lacks <?xml version="1.0"?>, yet I think it's not required.

<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="5" yahoo:created="2010-03-09T05:09:03Z" yahoo:lang="en-US" yahoo:updated="2010-03-09T05:09:03Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=select+*+from+html+where+xpath%3D%22%2F%2Fmeta%22+and+url%3D%22http%3A%2F%2Fwww.cnn.com%22"><results><meta content="HTML Tidy for Java (vers. 26 Sep 2004), see www.w3.org" name="generator"/><meta content="1800;url=?refresh=1" http-equiv="refresh"/><meta content="CNN.com delivers the latest breaking news and information on the latest top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN.com provides special reports, video, audio, photo galleries, and interactive guides." name="Description"/><meta content="CNN, CNN news, CNN.com, CNN TV, news, news online, breaking news, U.S. news, world news, weather, business, CNN Money, sports, politics, law, technology, entertainment, education, travel, health, special reports, autos, developing story, news video, CNN Intl" name="Keywords"/><meta content="text/html; charset=iso-8859-1" http-equiv="content-type"/></results></query><!-- total: 250 --> 

Tested it on my local server (PHP 5.3), no errors reported. I've used your source code and it works. Here's a print_r():


SimpleXMLElement Object
(
    [results] => SimpleXMLElement Object
        (
            [meta] => Array
                (
                    [0] => SimpleXMLElement Object
                        (
                            [@attributes] => Array
                                (
                                    [content] => HTML Tidy for Java (vers. 26 Sep 2004), see www.w3.org
                                    [name] => generator
                                )

                        )

                    [1] => SimpleXMLElement Object
                        (
                            [@attributes] => Array
                                (
                                    [content] => 1800;url=?refresh=1
                                    [http-equiv] => refresh
                                )

                        )

                    [2] => SimpleXMLElement Object
                        (
                            [@attributes] => Array
                                (
                                    [content] => CNN.com delivers the latest breaking news and information on the latest top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN.com provides special reports, video, audio, photo galleries, and interactive guides.
                                    [name] => Description
                                )

                        )

                    [3] => SimpleXMLElement Object
                        (
                            [@attributes] => Array
                                (
                                    [content] => CNN, CNN news, CNN.com, CNN TV, news, news online, breaking news, U.S. news, world news, weather, business, CNN Money, sports, politics, law, technology, entertainment, education, travel, health, special reports, autos, developing story, news video, CNN Intl
                                    [name] => Keywords
                                )

                        )

                    [4] => SimpleXMLElement Object
                        (
                            [@attributes] => Array
                                (
                                    [content] => text/html; charset=iso-8859-1
                                    [http-equiv] => content-type
                                )

                        )

                )

        )

)

I'd suggest you to encode the URL, but that's already done. You could try performing the query with cURL.

Upvotes: 0

Related Questions