Reputation: 34034
I'm trying to fetch Wikipedia pages using LWP::Simple, but they're not coming back. This code:
#!/usr/bin/perl
use strict;
use LWP::Simple;
print get("http://en.wikipedia.org/wiki/Stack_overflow");
doesn't print anything. But if I use some other webpage, say http://www.google.com
, it works fine.
Is there some other name that I should be using to refer to Wikipedia pages?
What could be going on here?
Upvotes: 11
Views: 3681
Reputation: 86
I solved this problem using LWP:RobotUA
instead of LWP::UserAgent
. You can read the document below. There are not much differences you should modify.
http://lwp.interglacial.com/ch12_02.htm
Upvotes: 6
Reputation: 15063
You can also just set the UA on the LWP::Simple module - just import the $ua variable, and it'll allow you to modify the underlying UserAgent:
use LWP::Simple qw/get $ua/;
$ua->agent("WikiBot/0.1");
print get("http://en.wikipedia.org/wiki/Stack_overflow");
Upvotes: 11
Reputation: 1983
Also see the Mediawiki related CPAN modules - these are designed to hit Mediawiki sites (of which wikipedia is one) and might give you more bells and whistles than simple LWP.
http://cpan.uwinnipeg.ca/search?query=Mediawiki&mode=dist
Upvotes: 5
Reputation: 34034
Apparently Wikipedia blocks LWP::Simple requests: http://www.perlmonks.org/?node_id=695886
The following works instead:
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
my $url = "http://en.wikipedia.org/wiki/Stack_overflow";
my $ua = LWP::UserAgent->new();
my $res = $ua->get($url);
print $res->content;
Upvotes: 18
Reputation: 1259
Because Wikipedia is blocking the HTTP user-agent string used by LWP::Simple.
You will get a "403 Forbidden"-response if you try using it.
Try the LWP::UserAgent module to work around this, setting the agent-attribute.
Upvotes: 5