user2348668
user2348668

Reputation: 768

LWP is not following redirections

when I try to access Google I get the error message 302 Found

Here is the code

use parent qw(LWP::UserAgent);

sub new {
  my $class = shift;
  my $self  = {};
  bless $self, $class;

  $self->agent("Mozilla/8.0");
  $self->timeout(20);

  return $self;
}

sub gethtml {
  my ($self, $url) = @_;

  my $response = $self->get($url);

  $response->is_success ?
      $response->decoded_content :
      $response->status_line;
}

1;

test.pl :

my $spider = Spider->new;
say $spider->gethtml('http://www.google.com/');

I have no idea why this is happening, I have used LWP without problems for a while

Upvotes: 1

Views: 1081

Answers (2)

Borodin
Borodin

Reputation: 126722

When you subclass a module and want to overload one of the methods it is more than likely that you need to execute the base class's method first, and then add some tweaks of your own. You can do this using the SUPER pseudo-class, which allows you to call a method from the base class.

Your Spider module isn't doing the necessary LWP::UserAgent initialisation at all, so your $self is just a reference to an empty hash. It should look like this

package Spider;

use strict;
use warnings;

use parent 'LWP::UserAgent';

sub new {
  my $class = shift;
  my $self = $class->SUPER::new(@_);

  $self->agent('Mozilla/8.0');
  $self->timeout(20);

  $self;
}

sub gethtml {
  my ($self, $url) = @_;

  my $response = $self->get($url);

  $response->is_success ?
      $response->decoded_content :
      $response->status_line;
}

1;

By the way, you should take care not to suck bandwidth out of site that are meant to be accessed manually. Be nice, and especially take care to abide by the robots.txt file that says how the site may be accessed by a spider program. You should take a look at LWP::RobotUA which has been written expressly for this purpose.

Upvotes: 3

nick_v1
nick_v1

Reputation: 1664

The last I heard LWP::UserAgent doesn't follow 300-style redirects by default.

Try using WWW::Mechanize. It is much better suited for crawling the web and you have much more control of it.

Upvotes: -1

Related Questions