Reputation: 299
As the title says WWW::Mechanize does not recognize
<base href="" />
if page content iz gzipped. Here is an example:
use strict;
use warnings;
use WWW::Mechanize;
my $url = 'http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html';
my $mech = WWW::Mechanize->new;
$mech->get($url);
print $mech->base()."\n";
# force plain text instead of gzipped content
$mech->get($url, 'Accept-Encoding' => 'identity');
print $mech->base()."\n";
Output:
http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html
http://objectmix.com/ <--- this is correct !
Am I missing something here? Thanks
Edit: I just tested it directly with LWP::UserAgent and it works without any problems:
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $res = $ua->get('http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html');
print $res->base()."\n";
Output:
http://objectmix.com/
This looks like WWW::Mechanize bug?
Edit 2: It is LWP or HTTP::Response bug, not WWW::Mechanize. LWP does not request gzip by default. If I set
$ua->default_header('Accept-Encoding' => 'gzip'),
in the above example it returns wrong base
Edit 3: Bug is in LWP/UserAgent.pm in parse_head()
It calls HTML/HeadParser with gzipped HTML and HeadParser has no idea what to do with it. LWP should gunzip the content before calling parsing subroutine.
Upvotes: 2
Views: 497
Reputation: 299
There is bug report about this: https://rt.cpan.org/Public/Bug/Display.html?id=54361
Conclusion: LWP is missing this "feature".
WWW::Mechanize:
This could eventually be solved by overloading _make_request() in WWW::Mechanize with your own pkg and re-seting HTTP::Response by decoded_content or even dirtier by overwriting $mech->{base} with the parse base from content.
Upvotes: 1
Reputation: 5069
I think it is not a bug, it is a feature. WWW::Mechanize try to be smart because some browser act one way if they saw 'base href=""' some act in the other way.
What about when the base set prroperly to ?
I think it is matter to use "" or / as a base.
<base href="" />
<base href="/" />
Upvotes: 0