Stick it to THE MAN
Stick it to THE MAN

Reputation: 5701

fopen not working for some urls?

I am having problems reading some urls. There is nothing wrong with the urls, as I can view them in my browser (an example of one such URL is given below):

http://www.bloomberg.com/apps/news?pid=20601087&sid=a2BhXFMpbb5M

I am using fopen like this in my code:

public static function grokPage($path)
{
    $data = '';
    $file = fopen($path, "r");

    if ($file)
    {
        while (!feof($file))
            $data .= fgets($file, 1024);
    }
    return $data;
}

the error I get is:

Warning: fopen(http://www.bloomberg.com/apps/news?pid=20601087&sid=a2BhXFMpbb5M) [0function.fopen0]: failed to open stream: Redirection limit reached, aborting in xxx_filename.php

From the PHP fopen doc, it seems I am using the function correctly. Does anyone understand the recursion warning and how to fix it?

Upvotes: 1

Views: 2063

Answers (2)

Charles
Charles

Reputation: 51411

"Redirection limit reached" means that the remote site was sending back a Location header, that Location was followed, and then the redirected location gave a Location header again. This process continued until some predefined number of redirections (Location headers) was reached.

It's likely that the site is intentionally trying to redirect the client somewhere else, but has a bug that is causing a loop.

You should consider trying another way to fetch the URL -- one that lets you specify things like the user-agent string. Try curl, it's ugly but it works well. Try disguising as IE6 or Firefox instead of PHP or curl.

Edit: Pekka's comment contains a link with information on using fopen wrappers, including how to set the user-agent string.

Upvotes: 2

Pekka
Pekka

Reputation: 449623

This means that your target page is returning more redirects to different addresses (probably using a Location: header) than your max_redirects setting has specified.

This looks like a very good article on how to fetch web pages using the fopen wrappers. It contains an example on how to change the max_redirects setting.

It could well be, though, that Bloomberg are shutting you out intentionally because it detects automated data scraping, which may be a violation of their terms and conditions.

Upvotes: 2

Related Questions