Reputation: 2140
I'm trying to code an RSS scraper that will pull down a feed and use XMLReader (or DOMdocument) to spit out a list of available tags.
My host does not enable file_get_contents so I pull the feed down into a variable and then use the "load from string" way of instantiating my XMLreader or DOMdocument (I've tried both methods so far).
When I test on my local box (where I enable file_get_contents) my script is able to pull out the XML tags. When I use cURL, however, I get a range of errors.
I have already tried to UTF8 encode the string after using html_entity_decode.
The cURL options I am using are:
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
If I grab the text from the feed with my browser and save it as a file on my box then my script is also able to recognize the tags.
The error I am getting when using cURL is a parsing error.
So my question is - what settings must I use with cURL to be able to parse RSS?
Upvotes: 1
Views: 1563
Reputation:
Never use file_get_contents() for a remote file, it is very slow, very cpu intensive, and does not handle redirects, caching, cookies, etc. like the flexibility you have with curl.
Even better than curl; faster, more flexible, and less cpu intensive is using fsocket... there are many php classes that make it dead simple to do this, here is one of my favorites:
http://scripts.incutio.com/httpclient/
Upvotes: 2