Reputation: 9034
I am building an rss feed discovery service by scraping a page URL and finding the <link>
tags in the page header. The problem is some URLs take really long to serve the page source so my code gets stuck at file_get_contents($url)
very often.
Is there a way to do this with a predefined timeout, for example if 10 seconds have passed and there is still no content served then simply drop that URL and move to the next one?
I was thinking to use the maxLen
parameter to get only a part of the source (<head>..</head>
) but I'm not sure if this would really stop after the received bytes are reached of would still require the full page load. The other issue with this is that I have no idea what value to set here because every page has different content in the head
tag so sizes vary.
Upvotes: 0
Views: 900
Reputation: 3170
Use the 'context' parameter. You can create a stream context by using the 'stream_context_create' function, and specifying in the http context the desired timeout.
$context = stream_context_create(array(
'http' => array(
'timeout' => YOUR_TIMEOUT,
)
));
$content = file_get_contents(SOME_FILE, false, $context);
More information: Here and also here.
Upvotes: 2
Reputation: 2920
I've just been reading about this, so this is theory only right now.. but..
This is the function definition, notice the resource context part:
string file_get_contents ( string $filename [, bool $use_include_path = false [, **resource $context** [, int $offset = -1 [, int $maxlen ]]]] )
If you specify the result of a stream_context_create()
function and pass it the timeout value in it's options array, it just might work.
$context = stream_context_create($opts);
Or you could create the stream and set it's timeout directly:
http://www.php.net/manual/en/function.stream-set-timeout.php
Hope you have some success with it.
Upvotes: 2