slash197
slash197

Reputation: 9034

file_get_contents speed-up

I am building an rss feed discovery service by scraping a page URL and finding the <link> tags in the page header. The problem is some URLs take really long to serve the page source so my code gets stuck at file_get_contents($url) very often.

Is there a way to do this with a predefined timeout, for example if 10 seconds have passed and there is still no content served then simply drop that URL and move to the next one?

I was thinking to use the maxLen parameter to get only a part of the source (<head>..</head>) but I'm not sure if this would really stop after the received bytes are reached of would still require the full page load. The other issue with this is that I have no idea what value to set here because every page has different content in the head tag so sizes vary.

Upvotes: 0

Views: 900

Answers (2)

EyalAr
EyalAr

Reputation: 3170

Use the 'context' parameter. You can create a stream context by using the 'stream_context_create' function, and specifying in the http context the desired timeout.

$context = stream_context_create(array(
    'http' => array(
        'timeout' => YOUR_TIMEOUT,
    )
));
$content = file_get_contents(SOME_FILE, false, $context);

More information: Here and also here.

Upvotes: 2

FreudianSlip
FreudianSlip

Reputation: 2920

I've just been reading about this, so this is theory only right now.. but..

This is the function definition, notice the resource context part:

string file_get_contents ( string $filename [, bool $use_include_path = false [, **resource $context** [, int $offset = -1 [, int $maxlen ]]]] )

If you specify the result of a stream_context_create() function and pass it the timeout value in it's options array, it just might work.

$context = stream_context_create($opts);

Or you could create the stream and set it's timeout directly:

http://www.php.net/manual/en/function.stream-set-timeout.php

Hope you have some success with it.

Upvotes: 2

Related Questions