tlre0952b
tlre0952b

Reputation: 751

How do I pull HTML form option values?

Autotrader's website has a form called 'searchVehicle'. In that form are a number of HTML values for the given form input fields qr/^(radius|make|model|price-to). These values are as you'd expect, i.e. a drop-down list. How do I pull the string values of those <option>'s?

So far I have the following Perl code:

#!/usr/bin/perl

use strict;
use warnings;
use utf8;
use WWW::Mechanize;
use Data::Dumper;
use JSON;

binmode STDOUT, ':encoding(UTF-8)';
binmode STDIN, ':encoding(UTF-8)';

my $url = 'https://www.autotrader.co.uk/';

my $mech = WWW::Mechanize -> new( autocheck => 1 );
$mech -> agent_alias( 'Linux Mozilla');

if ($mech -> status( $mech -> get($url)) == 200)
{
    $mech -> form_name('searchVehicles');
    my @inputs = $mech -> find_all_inputs(
                            name_regex    => qr/^(radius|make|model|price-to)$/,
                            type    =>  'option',);
    print Dumper \@inputs;
};

My result is like:

$VAR1 = [
          bless( {
                   'idx' => 1,
                   'type' => 'option',
                   'current' => 0,
                   'name' => 'radius',
                   'menu' => [
                               {
                                 'name' => 'Select the distance',
                                 'seen' => 1,
                                 'value' => ''
                               }
                             ],
                   'id' => 'radius',
                   'aria-label' => 'Choose a radius',
                   'class' => 'c-form__select'
                 }, 'HTML::Form::ListInput' ),
          bless( {
                 }, 'HTML::Form::ListInput' ),
          bless( {
                 }, 'HTML::Form::ListInput' ),
          bless( {
                 }, 'HTML::Form::ListInput' )
        ];

Note: I have truncated all but the first since you get the idea with the first.

Upvotes: 1

Views: 117

Answers (1)

zdim
zdim

Reputation: 66883

The documentation for WWW::Mechanize::find_all_inputs says

find_all_inputs() returns an array of all the input controls in the current form whose properties match all of the regexes passed in. The controls returned are all descended from HTML::Form::Input. See "INPUTS" in HTML::Form for details.

So your @inputs is an array of HTML::Form objects.

From the given link to its INPUTS section we find methods like name, value, value_names, and many others. This is what you use to pull the needed information.

Update for the actual site that was provided

However, pages on this site are organized in a way that requires a bit more. The given URL does not list values anywhere (use Ctrl-U to see source), relegating action to /car-search.

When we load that page then we can use the above discussion. The code below aims to show how to retrieve information, please adjust for maintainability

use warnings;
use strict;
use feature 'say';

use WWW::Mechanize;

use open ':std', ':encoding(UTF-8)';

#my $url = 'https://www.autotrader.co.uk';
my $url = 'https://www.autotrader.co.uk/car-search';

my $mech = WWW::Mechanize->new( autocheck => 1 );
my $status = $mech->status( $mech->get($url) );
die "Got $status\n" if $status != 200;

$mech->form_number(2);

my @inputs = $mech -> find_all_inputs(
    name_regex => qr/^(radius|make|model|price-to)$/,
    type       =>  'option'
);

foreach my $input (@inputs) {
    say "input: ", $input->name;
    say join ' ', $input->possible_values;
    say "\t$_" for $input->value_names;
}

Output, with categories truncated for convenience

input: radius
1500 1 5 10 15 20 25 30 35 40 45 50 55 60 70 80 90 100 200
        Distance (national)
        Within 1 mile
        Within 5 miles
        ...
input: price-to
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 [...]
        (any)
        £500 (934)
        £1,000 (7,683)
        ...

Only radius and price-to are in this form. The other terms of interest are in different HTML elements further down the source, and have to be retrieved in other ways.

Upvotes: 3

Related Questions