Siva
Siva

Reputation: 19

Getting the Error 403 Forbidden while posting the website address through Perl

use strict;

use LWP::UserAgent;

my $UserAgent = LWP::UserAgent->new;

my $response = $UserAgent->get("https://scholar.google.co.in/scholar_lookup?author=N.+R.+Alpert&author=S.+A.+Mohiddin&author=D.+Tripodi&author=J.+Jacobson-Hatzell&author=K.+Vaughn-Whitley&author=C.+Brosseau+&publication_year=2005&title=Molecular+and+phenotypic+effects+of+heterozygous,+homozygous,+and+compound+heterozygote+myosin+heavy-chain+mutations&journal=Am.+J.+Physiol.+Heart+Circ.+Physiol.&volume=288&pages=H1097-H1102");

if ($response->is_success)

{

$response->content =~ /<title>(.*?) - Google Scholar<\/title>/;

print $1;
}

else

{

die $response->status_line;

}

I am getting the below error while running this script.

403 Forbidden at D:\Getelement.pl line 52.

I have pasted this website address in address bar, and its redirecting exactly to that site, but its not working in while running by script.

Can you please help me on this issue.

Upvotes: 0

Views: 453

Answers (3)

JRFerguson
JRFerguson

Reputation: 7526

You can fetch your content if you add a User Agent string to identify yourself to the web server:

...
my $UserAgent = LWP::UserAgent-new;
$UserAgent->agent('Mozilla/5.0'); #...add this...
...
print $1;
...

This prints: "Molecular and phenotypic effects of heterozygous, homozygous, and compound heterozygote myosin heavy-chain mutations"

Upvotes: 0

Michael Schm.
Michael Schm.

Reputation: 2534

Google has blacklisted LWP::UserAgent They either blacklisted the UserAgent or parts of the request (headers whatsoever).

I suggest you use Mojo::UserAgent.. The request looks like by default more like a browser. You must write minimum 1 lines of code.

use Mojo::UserAgent;
use strict;
use warnings;

print Mojo::UserAgent->new->get('https://scholar.google.co.in/scholar_lookup?author=N.+R.+Alpert&author=S.+A.+Mohiddin&author=D.+Tripodi&author=J.+Jacobson-Hatzell&author=K.+Vaughn-Whitley&author=C.+Brosseau+&publication_year=2005&title=Molecular+and+phenotypic+effects+of+heterozygous,+homozygous,+and+compound+heterozygote+myosin+heavy-chain+mutations&journal=Am.+J.+Physiol.+Heart+Circ.+Physiol.&volume=288&pages=H1097-H1102')->res->dom->at('title')->text;

# Prints Molecular and phenotypic effects of heterozygous, homozygous, and      
# compound heterozygote myosin heavy-chain mutations - Google Scholar

Terms

The code does not accept any terms nor additional lines has been added to bypass security checks. It's absolutely fine.

Upvotes: 0

sidyll
sidyll

Reputation: 59297

Google Terms of Service disallow automated searches. They are detecting you're sending this from a script because your headers and your browser standard headers are very different, and you can analyze them if you want.

In the old times they had a SOAP API, and you could use modules like WWW::Search::Google but that's not the case anymore because this API was deprecated.

Alternatives were already discussed in the following StackOverflow question:

Upvotes: 2

Related Questions