conorw89
conorw89

Reputation: 45

Perl: Problems with WWW:Mechanize and a form

i am trying to write a script that will navigate through a soccer website to the player of my choice and scrape their info for me. I have the scraping part working by just hard coding an individual player's page in, but trying to implement the navigation is giving me some problems. The website in question is http://www.soccerbase.com.

I have to fill in a form present at the top of the page with the player's name, then submit it for the search. I have tried it two different ways(commenting out one of them) based on info i found online but to no avail. I am an absolute novice when it comes to Perl so any help would be greatly appreciated! Thanks in advance. here is my code:

#!/usr/bin/perl
use strict;

require WWW::Mechanize;
require HTML::TokeParser;

my $player = 'Luis Antonio Valencia';
#die "Must provide a player's name" unless $player ne 1;

my $agent = WWW::Mechanize->new();
$agent->get('http://www.soccerbase.com/players/home.sd');
$agent->form_name('headSearch');
$agent->set_fields('searchTeamField', $player);
$agent->click_button(name=>"Search");

#$agent->submit_form(
#       form_number => 1,
#       fields    => {   => 'Luis Antonio Valencia', }    
#   );

my $stream = HTML::TokeParser->new(\$agent->{content});
my $player_name;

$stream->get_tag("strong");
$player_name = $stream->get_trimmed_text("/strong");

print "\n", "Player Name: ", $player_name, "\n";

Upvotes: 2

Views: 580

Answers (3)

daxim
daxim

Reputation: 39158

It's a bit tricky because the form action plays switcharoo with Javascript, but HTML::Form is able to handle that perfectly fine:

#!/usr/bin/env perl
use WWW::Mechanize qw();
use URI qw();

my $player = 'Luis Antonio Valencia';
my $agent = WWW::Mechanize->new;
$agent->get('http://www.soccerbase.com/players/home.sd');
my $form = $agent->form_id('headSearch');
{
    my $search_uri = $agent->uri;
    $search_uri->path('/players/search.sd');
    $form->action($search_uri);
    # requires absolute URI
}
$agent->submit_form(
    fields => {
        search => $player,
        type => 'player',
    }
);

Upvotes: 3

Sinan Ünür
Sinan Ünür

Reputation: 118166

It looks like the form elements do not have name attributes and I am assuming the query string is formed by some other means by translating the id attributes to yield:

http://www.soccerbase.com/players/search.sd?search=Luis+Antonio+Valencia&type=player

You'd think the following would work, but it doesn't suggesting that there is some other JavaScript goodness(!) happening behind the scenes.

#!/usr/bin/env perl

use strict;
use warnings;

use HTML::TableExtract;
use LWP::Simple qw(get);
use URI;

my $player = 'Luis Antonio Valencia';

my $uri = URI->new('http://www.soccerbase.com/players/home.sd');
$uri->query_form(
    search => $player,
    type   => 'player',
);

my $content = get "$uri";
die "Failed to get '$uri'\n" unless defined $content;

my $te = HTML::TableExtract->new(
    attribs => { class => 'clubInfo' },
);

$te->parse($content);
die unless $te->tables;

my ($table) = $te->tables;
my ($row) = $table->rows;

print $row->[1], "\n";

Upvotes: 1

snoofkin
snoofkin

Reputation: 8895

Easier way is to look at the HTTP request it makes, for instance:

http://www.soccerbase.com/players/search.sd?search=kkkk&type=player

'kkkk' is the player name, use LWP::UserAgent to make that request, and it will give you the result, change the 'kkk' to the name of the player you are looking to get info for, and that will do the job, using Mech for that is an overkill, if you ask me, make sure that if the player name has spaces,etc encode it.

Upvotes: 1

Related Questions