Reputation: 1
I am trying to download a file from a web page.
First I get the links with the linkextractor and then I want to download them with the lwp I'm a newbie programming in perl.
I made the following code ...
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use HTML::LinkExtractor;
use LWP::Simple qw(get);
use Archive::Zip;
my $html = get $ARGV[0];
my $te = HTML::TableExtract->new(
keep_html => 1,
headers => [qw( column1 column2 )],
);
$te->parse($html);
# I get only the first row
my ($row) = $te->rows;
my $LXM = new HTML::LinkExtractor(undef,undef,1);
$LXM->parse(\$$row[0]);
my ($t) = $LXM->links;
my $LXS = new HTML::LinkExtractor(undef,undef,1);
$LXS->parse(\$$row[1]);
my ($s) = $LXS->links;
#-------
for (my $i=0; $i < scalar(@$s); $i++) {
print "$$s[$i]{_TEXT} $$s[$i]{href} $$t[$i]{href} \n";
my $file = '/tmp/$$s[$i]{_TEXT}';
my $url = $$s[$i]{href};
my $content = getstore($url, $file);
die "Couldn't get it!" unless defined $content;
}
And I get the following error
Undefined subroutine &main::getstore called at ./geturlfromtable.pl line 35.
Thanks in advance!
Upvotes: 0
Views: 159
Reputation: 69244
LWP::Simple can be loaded in two different ways.
use LWP::Simple;
This loads the module and makes all of its functions available to your program.
use LWP::Simple qw(list of function names);
This loads the module and only makes available the specific set of functions you have requested.
You have this code:
use LWP::Simple qw(get);
This makes the get()
function available, but not the getstore()
function.
To fix this, either add getstore()
to your list of functions.
use LWP::Simple qw(get getstore);
Or (probably simpler) remove the list of functions.
use LWP::Simple;
Update: I hope you don't mind if I add a couple of style points.
Firstly, you're using a really old module - HTML::LinkExtractor. It hasn't been updated for almost fifteen years. I'd recommend looking at HTML::LinkExtor instead.
Secondly, your code uses a lot of references, but you're using them in a really over-complicated way. For example, where you have \$$row[0]
, you really only need $row->[0]
. Similarly, $$s[$i]{href}
will be easy for most people to understand if written as $s->[$i]{href}
.
Next, you use the C-style for loop and iterate over the array's indexes. It's usually simpler to use foreach
to iterate from zero to the last index in the array.
foreach my $i (0 .. $#$s) {
print "$s->[$i]{_TEXT} $s->[$i]{href} $t->[$i]{href} \n";
my $file = "/tmp/$s->[$i]{_TEXT}";
my $url = $s->[$i]{href};
my $content = getstore($url, $file);
die "Couldn't get it!" unless defined $content;
}
And finally, you seem slightly confused about what getstore()
returns. It returns the HTTP response code. So it will never be undefined. If there's a problem retrieving the content, you'll get 500 or 403 or something like that.
Upvotes: 2