toktok
toktok

Reputation: 299

WWW::Mechanize ignores base href on gzipped content

As the title says WWW::Mechanize does not recognize

<base href="" /> 

if page content iz gzipped. Here is an example:

use strict;
use warnings;
use WWW::Mechanize;

my $url = 'http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html';

my $mech = WWW::Mechanize->new;
$mech->get($url);
print $mech->base()."\n";

 # force plain text instead of gzipped content
$mech->get($url, 'Accept-Encoding' => 'identity');
print $mech->base()."\n";

Output:

http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html
http://objectmix.com/    <--- this is correct !

Am I missing something here? Thanks

Edit: I just tested it directly with LWP::UserAgent and it works without any problems:

use LWP::UserAgent;

my $ua = LWP::UserAgent->new();
my $res = $ua->get('http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html');
print $res->base()."\n";

Output:

http://objectmix.com/ 

This looks like WWW::Mechanize bug?

Edit 2: It is LWP or HTTP::Response bug, not WWW::Mechanize. LWP does not request gzip by default. If I set

$ua->default_header('Accept-Encoding' => 'gzip'),

in the above example it returns wrong base

Edit 3: Bug is in LWP/UserAgent.pm in parse_head()

It calls HTML/HeadParser with gzipped HTML and HeadParser has no idea what to do with it. LWP should gunzip the content before calling parsing subroutine.

Upvotes: 2

Views: 497

Answers (2)

toktok
toktok

Reputation: 299

There is bug report about this: https://rt.cpan.org/Public/Bug/Display.html?id=54361

Conclusion: LWP is missing this "feature".

WWW::Mechanize:

This could eventually be solved by overloading _make_request() in WWW::Mechanize with your own pkg and re-seting HTTP::Response by decoded_content or even dirtier by overwriting $mech->{base} with the parse base from content.

Upvotes: 1

user1126070
user1126070

Reputation: 5069

I think it is not a bug, it is a feature. WWW::Mechanize try to be smart because some browser act one way if they saw 'base href=""' some act in the other way.

What about when the base set prroperly to ?

I think it is matter to use "" or / as a base.

<base href="" /> 
<base href="/" /> 

Upvotes: 0

Related Questions