Reputation: 699
A client want that we deliver content via RSS feed, they use cURL to get the feed contents, but they say that they get an 404 error instead. I have tried this command in the terminal: $ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml
and as the client says I get the 404 page instead of the feed. When I type the URI in the browser it shows the feed without problem.
I cannot change anything in the client app, so, how can I ensure that they get the feed instead of the 404 error?
Thanks!
Upvotes: 3
Views: 2368
Reputation: 14479
My initial though was that this may be related to cookies (see this question), but this may be a localized issue. This is working fine from my machine:
[root@devtest tmp]# curl -g --compressed http://mediosymedia.com/wp-content/plug
ins/nextgen-gallery/xml/media-rss.php > temp.xml
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27926 0 27926 0 0 54564 0 --:--:-- --:--:-- --:--:-- 69815
Thanks to Julien for pointing out that the contents of the downloaded file was the custom 404 page contents. As he mentions, you need to add a useragent flag (-A
) to your curl
requests:
# curl -A "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1
; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"-g --compressed http://medio
symedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml
I would just delete my answer, but it's worth leaving up as a warning to others who might be experiencing this issue - make sure you validate the response!
Upvotes: 0
Reputation: 32982
Indeed, the curl
returns a 404 status page:
$ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D-
HTTP/1.1 **404 Not Found**
Date: Tue, 04 Mar 2014 08:12:27 GMT
Server: Apache
X-Pingback: http://mediosymedia.com/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Many webservers will be suspicious of requests without a browser User-Agent
because they expect curl
to be used for scraping. This is probably not the smartest technique because a simple UserAgent spoofing will fix that problem:
$ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D- -H'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:27.0) Gecko/20100101 Firefox/27.0'
HTTP/1.1 **200 OK**
Date: Tue, 04 Mar 2014 08:13:46 GMT
Server: Apache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Transfer-Encoding: chunked
Content-Type: text/xml;charset=utf-8
So, in practice, make sure you set up a User-Agent for your requests that is not Curl's.
Upvotes: 2