Tirumalesh
Tirumalesh

Reputation: 1

Unable to get the web content using LWP::Simple but able to get content from LWP::UserAgent

I am trying to run below code to parse the contents of html page for the below URL

#!/usr/bin/perl
use LWP::Simple;
use HTML::TreeBuilder;
$response = get("http://www.viki.com/");
print $response;

Nothing gets printed. This is working if emulated from a browser.

Upvotes: 0

Views: 549

Answers (1)

mttrb
mttrb

Reputation: 8345

When I try to access http://www.viki.com using LWP::UserAgent I get the following response:

<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>

The get subroutine in LWP::Simple is implemented as follows (at least in version 6.13).

sub get ($)
{
    my $response = $ua->get(shift);
    return $response->decoded_content if $response->is_success;
    return undef;
}

As you can see, the get method will only return the content if the response is a success, otherwise it will return undef.

The response from LWP::UserAgent is a 403 error, in other words not a success. Therefore, LWP::Simple will return undef for the same URL.

It appears that the website (http://www.viki.com) is checking the user agent string and only returning content to "valid" user agents. LWP::Simple is hard-coded to use LWP::Simple/$VERSION as the user agent.

If you really must use LWP::Simple then you could force the user agent like this:

use LWP::Simple qw/ get $ua /;

$ua->agent('Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0');

print get('http://www.viki.com');

LWP::Simple exposes the LWP::UserAgent instance that it uses internally as the optionally included $ua variable. It is still necessary to configure the user agent on this instance to get this particular page to load.

Upvotes: 3

Related Questions