Oleg
Oleg

Reputation: 2821

why do I get wrong data using curl?

I try to get rss, I get wrong data for some reason:

$url = "http://rss.news.yahoo.com/rss/oddlyenough";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$xml = curl_exec($ch);      
curl_close($ch);
echo htmlentities($xml, ENT_QUOTES, "UTF-8");

The output:

<!-- rc2.ops.ch1.yahoo.com uncompressed/chunked Sun Nov 25 15:57:06 UTC 2012 --> 

If I try to load this data other way I get correct data. For example this one works:

$xml = simplexml_load_file('http://rss.news.yahoo.com/rss/oddlyenough');
print "<ul>\n";
foreach ($xml->channel->item as $item){
  print "<li>$item->title</li>\n";
}
print "</ul>";

Could you please tell me what's the problem with code using curl?

Upvotes: 0

Views: 1236

Answers (1)

LSerni
LSerni

Reputation: 57388

You're running against a Location snag.

Add this option:

  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

so as to have:

$url = "http://rss.news.yahoo.com/rss/oddlyenough";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$xml = curl_exec($ch);      
curl_close($ch);
echo htmlentities($xml, ENT_QUOTES, "UTF-8");

Details

When you run the above code, the first answer you receive from Yahoo! is:

HTTP/1.0 301 Moved Permanently
Date: Sun, 25 Nov 2012 16:31:36 GMT
P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV"
Cache-Control: max-age=3600, public
Location: http://news.yahoo.com/rss/oddlyenough
Vary: Accept-Encoding
Content-Type: text/html; charset=utf-8
Age: 1586
Content-Length: 81
Via: HTTP/1.1 rc4.ops.ch1.yahoo.com (YahooTrafficServer/1.20.10 [cHs f ])
Server: YTS/1.20.10

<!-- rc4.ops.ch1.yahoo.com uncompressed/chunked Sun Nov 25 16:31:36 UTC 2012 -->

and it tells you to use the new address http://news.yahoo.com/rss/oddlyenough.

Actually, if you use directly the new address, your original code works (until they change the address again, that is...) and is a bit faster, making only one request instead of two.

Upvotes: 2

Related Questions