Reputation: 45
i am trying to write a script that will navigate through a soccer website to the player of my choice and scrape their info for me. I have the scraping part working by just hard coding an individual player's page in, but trying to implement the navigation is giving me some problems. The website in question is http://www.soccerbase.com
.
I have to fill in a form present at the top of the page with the player's name, then submit it for the search. I have tried it two different ways(commenting out one of them) based on info i found online but to no avail. I am an absolute novice when it comes to Perl so any help would be greatly appreciated! Thanks in advance. here is my code:
#!/usr/bin/perl
use strict;
require WWW::Mechanize;
require HTML::TokeParser;
my $player = 'Luis Antonio Valencia';
#die "Must provide a player's name" unless $player ne 1;
my $agent = WWW::Mechanize->new();
$agent->get('http://www.soccerbase.com/players/home.sd');
$agent->form_name('headSearch');
$agent->set_fields('searchTeamField', $player);
$agent->click_button(name=>"Search");
#$agent->submit_form(
# form_number => 1,
# fields => { => 'Luis Antonio Valencia', }
# );
my $stream = HTML::TokeParser->new(\$agent->{content});
my $player_name;
$stream->get_tag("strong");
$player_name = $stream->get_trimmed_text("/strong");
print "\n", "Player Name: ", $player_name, "\n";
Upvotes: 2
Views: 580
Reputation: 39158
It's a bit tricky because the form action plays switcharoo with Javascript, but HTML::Form is able to handle that perfectly fine:
#!/usr/bin/env perl
use WWW::Mechanize qw();
use URI qw();
my $player = 'Luis Antonio Valencia';
my $agent = WWW::Mechanize->new;
$agent->get('http://www.soccerbase.com/players/home.sd');
my $form = $agent->form_id('headSearch');
{
my $search_uri = $agent->uri;
$search_uri->path('/players/search.sd');
$form->action($search_uri);
# requires absolute URI
}
$agent->submit_form(
fields => {
search => $player,
type => 'player',
}
);
Upvotes: 3
Reputation: 118166
It looks like the form elements do not have name attributes and I am assuming the query string is formed by some other means by translating the id
attributes to yield:
http://www.soccerbase.com/players/search.sd?search=Luis+Antonio+Valencia&type=player
You'd think the following would work, but it doesn't suggesting that there is some other JavaScript goodness(!) happening behind the scenes.
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple qw(get);
use URI;
my $player = 'Luis Antonio Valencia';
my $uri = URI->new('http://www.soccerbase.com/players/home.sd');
$uri->query_form(
search => $player,
type => 'player',
);
my $content = get "$uri";
die "Failed to get '$uri'\n" unless defined $content;
my $te = HTML::TableExtract->new(
attribs => { class => 'clubInfo' },
);
$te->parse($content);
die unless $te->tables;
my ($table) = $te->tables;
my ($row) = $table->rows;
print $row->[1], "\n";
Upvotes: 1
Reputation: 8895
Easier way is to look at the HTTP request it makes, for instance:
http://www.soccerbase.com/players/search.sd?search=kkkk&type=player
'kkkk' is the player name, use LWP::UserAgent
to make that request, and it will give you the result, change the 'kkk' to the name of the player you are looking to get info for, and that will do the job, using Mech for that is an overkill, if you ask me, make sure that if the player name has spaces,etc encode it.
Upvotes: 1