Reputation: 62
I'm trying to save the whole web page on my system as a .html
file and then parse that file, to find some tags and use them.
I'm able to save/parse http://<url>
, but not able to save/parse https://<url>
. I'm using Perl.
I'm using the following code to save HTTP and it works fine but doesn't work for HTTPS:
use strict;
use warnings;
use LWP::Simple qw($ua get);
use LWP::UserAgent;
use LWP::Protocol::https;
use HTTP::Cookies;
sub main
{
my $ua = LWP::UserAgent->new();
my $cookies = HTTP::Cookies->new(
file => "cookies.txt",
autosave => 1,
);
$ua->cookie_jar($cookies);
$ua->agent("Google Chrome/30");
#$ua->ssl_opts( SSL_ca_file => 'cert.pfx' );
$ua->proxy('http','http://proxy.com');
my $response = $ua->get('http://google.com');
#$ua->credentials($response, "", "usrname", "password");
unless($response->is_success) {
print "Error: " . $response->status_line;
}
# Let's save the output.
my $save = "save.html";
unless(open SAVE, '>' . $save) {
die "nCannot create save file '$save'n";
}
# Without this line, we may get a
# 'wide characters in print' warning.
binmode(SAVE, ":utf8");
print SAVE $response->decoded_content;
close SAVE;
print "Saved ",
length($response->decoded_content),
" bytes of data to '$save'.";
}
main();
Is it possible to parse an HTTPS page?
Upvotes: 1
Views: 3340
Reputation: 69264
Always worth checking the documentation for the modules that you're using...
You're using modules from libwww-perl. That includes a cookbook. And in that cookbook, there is a section about HTTPS, which says:
URLs with https scheme are accessed in exactly the same way as with http scheme, provided that an SSL interface module for LWP has been properly installed (see the README.SSL file found in the libwww-perl distribution for more details). If no SSL interface is installed for LWP to use, then you will get "501 Protocol scheme 'https' is not supported" errors when accessing such URLs.
The README.SSL file says this:
As of libwww-perl v6.02 you need to install the LWP::Protocol::https module from its own separate distribution to enable support for https://... URLs for LWP::UserAgent.
So you just need to install LWP::Protocol::https.
Upvotes: 5
Reputation: 667
You need to have https://metacpan.org/module/Crypt::SSLeay for https links
It provides SSL support for LWP.
Bit me in the ass with a project of my own.
Upvotes: 0