Reputation: 55
I just made a script to grab links from a website, and in turn saves them into a text file.
Now I'm working on my regexes so it will grab links which contains php?dl=
in the url from the text file:
E.g.: www.example.com/site/admin/a_files.php?dl=33931
Its pretty much the address you get when you hover over the dl
button on the site. From which you can click to download or "right click save".
I'm just wondering on how to achieve this, having to download the content of the given address which will download a *.txt
file. All from the script of course.
Upvotes: 5
Views: 5187
Reputation: 431
Old question, but when I'm doing quick scripts, I often use "wget" or "curl" and pipe. This isn't cross-system portable, perhaps, but if I know my system has one or the other of these commands, it's generally good.
For example:
#! /usr/bin/env perl
use strict;
open my $fp, "curl http://www.example.com/ |";
while (<$fp>) {
print;
}
Upvotes: 0
Reputation: 37146
Make WWW::Mechanize
your new best friend.
Here's why:
/php\?dl=/
in this case)follow_link
methodget
the targets of those links and save them to fileAll this without needing to save your wanted links in an intermediate file! Life's sweet when you have the right tool for the job...
Example
use strict;
use warnings;
use WWW::Mechanize;
my $url = 'http://www.example.com/';
my $mech = WWW::Mechanize->new();
$mech->get ( $url );
my @linksOfInterest = $mech->find_all_links ( text_regex => qr/php\?dl=/ );
my $fileNumber++;
foreach my $link (@linksOfInterest) {
$mech->get ( $link, ':contentfile' => "file".($fileNumber++).".txt" );
$mech->back();
}
Upvotes: 8
Reputation: 149823
You can download the file with LWP::UserAgent:
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url, ':content_file' => 'file.txt');
Or if you need a filehandle:
open my $fh, '<', $response->content_ref or die $!;
Upvotes: 5