Reputation: 3233
I have written a Perl script which would check a list of URLs and connect to them by sending a GET request.
Now, let's say that one of these URLs has a file which is very big in size, for instance, has a size > 100 MB.
When a request is sent to download this file using this:
$mech=WWW::Mechanize->new();
$url="http://somewebsitename.com/very_big_file.txt"
$mech->get($url)
Once the GET request is sent, it will start downloading the file. I want this to be cancelled using WWW::Mechanize. How can I do that?
I checked the documentation of this Perl Module here:
http://metacpan.org/pod/WWW::Mechanize
However, I could not find a method which would help me do this.
Thanks.
Upvotes: 0
Views: 190
Reputation: 24063
GET
requestUsing the :content_cb
option, you can provide a callback function to get()
that will be executed for each chunk of response content received from the server. You can set* the chunk size (in bytes) using the :read_size_hint
option. These options are documented in LWP::UserAgent (get()
in WWW::Mechanize is just an overloaded version of the same method in LWP::UserAgent).
The following request will be aborted after reading 1024 bytes of response content:
use WWW::Mechanize;
sub callback {
my ($data, $response, $protocol) = @_;
die "Too much data";
}
my $mech = WWW::Mechanize->new;
my $url = 'http://www.example.com';
$mech->get($url, ':content_cb' => \&callback, ':read_size_hint' => 1024);
print $mech->response()->header('X-Died');
Too much data at ./mechanize line 12.
Note that the die
in the callback does not cause the program itself to die; it simply sets the X-Died
header in the response object. You can add the appropriate logic to your callback to determine under what conditions a request should be aborted.
Based on your comments, it sounds like what you really want is to never send a request in the first place if the content is too large. This is quite different from aborting a GET
request midway through, since you can fetch the Content-Length
header with a HEAD
request and perform different actions based on the value:
my @urls = qw(http://www.example.com http://www.google.com);
foreach my $url (@urls) {
$mech->head($url);
if ($mech->success) {
my $length = $mech->response()->header('Content-Length') // 0;
next if $length > 1024;
$mech->get($url);
}
}
Note that according to the HTTP spec, applications should set the Content-Length
header. This does not mean that they will (hence the default value of 0
in my code example).
* According to the documentation, the "protocol module which will try to read data from the server in chunks of this size," but I don't think it's guaranteed.
Upvotes: 6