Reputation: 19
use strict;
use LWP::UserAgent;
my $UserAgent = LWP::UserAgent->new;
my $response = $UserAgent->get("https://scholar.google.co.in/scholar_lookup?author=N.+R.+Alpert&author=S.+A.+Mohiddin&author=D.+Tripodi&author=J.+Jacobson-Hatzell&author=K.+Vaughn-Whitley&author=C.+Brosseau+&publication_year=2005&title=Molecular+and+phenotypic+effects+of+heterozygous,+homozygous,+and+compound+heterozygote+myosin+heavy-chain+mutations&journal=Am.+J.+Physiol.+Heart+Circ.+Physiol.&volume=288&pages=H1097-H1102");
if ($response->is_success)
{
$response->content =~ /<title>(.*?) - Google Scholar<\/title>/;
print $1;
}
else
{
die $response->status_line;
}
I am getting the below error while running this script.
403 Forbidden at D:\Getelement.pl line 52.
I have pasted this website address in address bar, and its redirecting exactly to that site, but its not working in while running by script.
Can you please help me on this issue.
Upvotes: 0
Views: 453
Reputation: 7526
You can fetch your content if you add a User Agent string to identify yourself to the web server:
...
my $UserAgent = LWP::UserAgent-new;
$UserAgent->agent('Mozilla/5.0'); #...add this...
...
print $1;
...
This prints: "Molecular and phenotypic effects of heterozygous, homozygous, and compound heterozygote myosin heavy-chain mutations"
Upvotes: 0
Reputation: 2534
Google has blacklisted LWP::UserAgent
They either blacklisted the UserAgent or parts of the request (headers whatsoever).
I suggest you use Mojo::UserAgent.. The request looks like by default more like a browser. You must write minimum 1 lines of code.
use Mojo::UserAgent;
use strict;
use warnings;
print Mojo::UserAgent->new->get('https://scholar.google.co.in/scholar_lookup?author=N.+R.+Alpert&author=S.+A.+Mohiddin&author=D.+Tripodi&author=J.+Jacobson-Hatzell&author=K.+Vaughn-Whitley&author=C.+Brosseau+&publication_year=2005&title=Molecular+and+phenotypic+effects+of+heterozygous,+homozygous,+and+compound+heterozygote+myosin+heavy-chain+mutations&journal=Am.+J.+Physiol.+Heart+Circ.+Physiol.&volume=288&pages=H1097-H1102')->res->dom->at('title')->text;
# Prints Molecular and phenotypic effects of heterozygous, homozygous, and
# compound heterozygote myosin heavy-chain mutations - Google Scholar
Terms
The code does not accept any terms nor additional lines has been added to bypass security checks. It's absolutely fine.
Upvotes: 0
Reputation: 59297
Google Terms of Service disallow automated searches. They are detecting you're sending this from a script because your headers and your browser standard headers are very different, and you can analyze them if you want.
In the old times they had a SOAP API, and you could use modules like
WWW::Search::Google
but that's not the case anymore because this
API was deprecated.
Alternatives were already discussed in the following StackOverflow question:
Upvotes: 2