Reputation: 751
Autotrader's website has a form called 'searchVehicle'. In that form are a number of HTML values for the given form input fields qr/^(radius|make|model|price-to)
. These values are as you'd expect, i.e. a drop-down list. How do I pull the string values of those <option>
's?
So far I have the following Perl code:
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use WWW::Mechanize;
use Data::Dumper;
use JSON;
binmode STDOUT, ':encoding(UTF-8)';
binmode STDIN, ':encoding(UTF-8)';
my $url = 'https://www.autotrader.co.uk/';
my $mech = WWW::Mechanize -> new( autocheck => 1 );
$mech -> agent_alias( 'Linux Mozilla');
if ($mech -> status( $mech -> get($url)) == 200)
{
$mech -> form_name('searchVehicles');
my @inputs = $mech -> find_all_inputs(
name_regex => qr/^(radius|make|model|price-to)$/,
type => 'option',);
print Dumper \@inputs;
};
My result is like:
$VAR1 = [
bless( {
'idx' => 1,
'type' => 'option',
'current' => 0,
'name' => 'radius',
'menu' => [
{
'name' => 'Select the distance',
'seen' => 1,
'value' => ''
}
],
'id' => 'radius',
'aria-label' => 'Choose a radius',
'class' => 'c-form__select'
}, 'HTML::Form::ListInput' ),
bless( {
}, 'HTML::Form::ListInput' ),
bless( {
}, 'HTML::Form::ListInput' ),
bless( {
}, 'HTML::Form::ListInput' )
];
Note: I have truncated all but the first since you get the idea with the first.
'HTML::Form::ListInput'
values?Upvotes: 1
Views: 117
Reputation: 66883
The documentation for WWW::Mechanize::find_all_inputs says
find_all_inputs() returns an array of all the input controls in the current form whose properties match all of the regexes passed in. The controls returned are all descended from HTML::Form::Input. See "INPUTS" in HTML::Form for details.
So your @inputs
is an array of HTML::Form objects.
From the given link to its INPUTS
section we find methods like name
, value
, value_names
, and many others. This is what you use to pull the needed information.
Update for the actual site that was provided
However, pages on this site are organized in a way that requires a bit more. The given URL does not list values anywhere (use Ctrl-U to see source), relegating action
to /car-search
.
When we load that page then we can use the above discussion. The code below aims to show how to retrieve information, please adjust for maintainability
use warnings;
use strict;
use feature 'say';
use WWW::Mechanize;
use open ':std', ':encoding(UTF-8)';
#my $url = 'https://www.autotrader.co.uk';
my $url = 'https://www.autotrader.co.uk/car-search';
my $mech = WWW::Mechanize->new( autocheck => 1 );
my $status = $mech->status( $mech->get($url) );
die "Got $status\n" if $status != 200;
$mech->form_number(2);
my @inputs = $mech -> find_all_inputs(
name_regex => qr/^(radius|make|model|price-to)$/,
type => 'option'
);
foreach my $input (@inputs) {
say "input: ", $input->name;
say join ' ', $input->possible_values;
say "\t$_" for $input->value_names;
}
Output, with categories truncated for convenience
input: radius 1500 1 5 10 15 20 25 30 35 40 45 50 55 60 70 80 90 100 200 Distance (national) Within 1 mile Within 5 miles ... input: price-to 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 [...] (any) £500 (934) £1,000 (7,683) ...
Only radius
and price-to
are in this form. The other terms of interest are in different HTML elements further down the source, and have to be retrieved in other ways.
Upvotes: 3