steph
steph

Reputation: 565

Using scrapy's FormRequest no form is submitted

After trying scrapy's first tutorial I was really excited about it. So I wanted to try form submission as well.

I have the following script and if I print out the response.body I am back to the page with the form and nothing happened. Can anybody help me how to get to the results page?

# spiders/holidaytaxi.py
import scrapy
from scrapy.http import Request, FormRequest
from scrapy.selector import HtmlXPathSelector, Selector


class HolidaytaxiSpider(scrapy.Spider):
    name = "holidaytaxi"
    allowed_domains = ["holidaytaxis.com"]
    start_urls = ['http://holidaytaxis.com/en']

    def parse(self, response): 
        return [FormRequest.from_response(
            response,
            formdata={
                'bookingtypeid':'Return',
                'airpotzgroupid_chosen':'Turkey',
                'pickup_chosen':'Antalya Airport',
                'dropoff_chosen':'Alanya',
                'arrivaldata':'12-07-2015',
                'arrivalhour':'12',
                'arrivalmin':'00',
                'departuredata':'14-07-2015',
                'departurehour':'12',
                'departuremin':'00',
                'adults':'2',
                'children':'0',
                'infants':'0'
            },
            callback=self.parseResponse
        )]

    def parseResponse(self, response):
        print "Hello World"
        print response.status
        print response
        heading = response.xpath('//div/h2')
        print "heading: ", heading

The output is:

2015-07-05 16:23:59 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-07-05 16:24:01 [scrapy] DEBUG: Redirecting (301) to <GET http://www.holidaytaxis.com/en> from <GET http://holidaytaxis.com/en>
2015-07-05 16:24:02 [scrapy] DEBUG: Crawled (200) <GET http://www.holidaytaxis.com/en> (referer: None)
2015-07-05 16:24:03 [scrapy] DEBUG: Crawled (200) <POST http://www.holidaytaxis.com/en/search> (referer: http://www.holidaytaxis.com/en)
Hello World
200
<200 http://www.holidaytaxis.com/en/search>
heading:  []

Upvotes: 2

Views: 1001

Answers (1)

alecxe
alecxe

Reputation: 473863

The main problem is in how you are passing the booking type, country, pickup and dropoff. You need to pass the corresponding "id"s instead of literal strings.

The following would work in your case:

return FormRequest.from_response(
    response,
    formxpath="//form[@id='transfer_search']",
    formdata={
        'bookingtypeid': '1',
        'airportgroupid': '14',
        'pickup': '121',
        'dropoff': '1076',
        'arrivaldate': '12-07-2015',
        'arrivalhour': '12',
        'arrivalmin': '00',
        'departuredate': '14-07-2015',
        'departurehour': '12',
        'departuremin': '00',
        'adults': '2',
        'children': '0',
        'infants': '0',
        'submit': 'GET QUOTE'
    },
    callback=self.parseResponse
)

Note that I've also fixed the arrivaldate and departuredate parameter names.


You may want to ask how did I get these IDs. Good question - I've used browser developer tools and studied the outgoing POST request issued on search form submit:

enter image description here

Now the real problem is how do you get the IDs in your Scrapy code. Booking types are easy to handle - there are only 3 types having ids from 1 to 3. The list of countries is actually available on the same search form page in a select tag with id="airportgroupid" - you can construct a mapping dictionary between the country name and it's internal ID, e.g.:

countries = {
    option.xpath("@label").extract()[0]: option.xpath("@value").extract()[0]
    for option in response.xpath("//select[@id='airportgroupid']//option")
}

country_id = countries["Turkey"]

It is getting more difficult with pickup and dropoff locations - they are booking type and country dependent and are retrieved with additional XHR requests to "http://www.holidaytaxis.com/en/search/getpickup" and "http://www.holidaytaxis.com/en/search/getdropoff" endpoints.

Upvotes: 5

Related Questions