Scrapy simulate XHR request - returning 400

I'm trying to get data from a site using Ajax. The page loads and then Javascript requests the content. See this page for details: https://www.tele2.no/mobiltelefon.aspx

The problem is that when i try to simulate this process by calling this url: https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters

I get a 400 response telling me that the request is not allowed. This is my code:

# -*- coding: utf-8 -*-
import scrapy
import json

class Tele2Spider(scrapy.Spider):
    name = "tele2"
    #allowed_domains = ["tele2.no/mobiltelefon.aspx"]
    start_urls = (
        'https://www.tele2.no/mobiltelefon.aspx/',
    )

    def parse(self, response):
        url = 'https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters'
        my_data = "{filters: []}"
        req = scrapy.Request( url, method='POST', body=json.dumps(my_data), headers={'X-Requested-With': 'XMLHttpRequest','Content-Type':'application/json'}, callback=self.parser2)
        yield req

    def parser2(self, response):
      print "test"

I'm new to scrapy and python so there might be something obvious I'm missing

Upvotes: 1

Views: 7121

Answers (1)

alecxe
alecxe

Reputation: 473863

The key problem is in missing quotes around the filters in the body:

url = 'https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters'
req = scrapy.Request(url,
                     method='POST',
                     body='{"filters": []}',
                     headers={'X-Requested-With': 'XMLHttpRequest',
                              'Content-Type': 'application/json; charset=UTF-8'},
                     callback=self.parser2)
yield req

Or, you can define it as a dictionary and then call json.dumps() to dump it to a string:

params = {"filters": []}
req = scrapy.Request(url,
                     method='POST',
                     body=json.dumps(params),
                     headers={'X-Requested-With': 'XMLHttpRequest',
                              'Content-Type': 'application/json; charset=UTF-8'},
                     callback=self.parser2)

As a proof, here is what it is giving me on the console:

2014-12-30 12:30:38-0500 [tele2] DEBUG: Crawled (200) <GET https://www.tele2.no/mobiltelefon.aspx/> (referer: None) 
2014-12-30 12:30:42-0500 [tele2] DEBUG: Crawled (200) <POST https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters> (referer: https://www.tele2.no/mobiltelefon.aspx/) 
test

Upvotes: 3

Related Questions