Andrew Gowa
Andrew Gowa

Reputation: 129

Scraping ajax page with Scrapy?

I'm using Scrapy for scrape data from this page

https://www.bricoetloisirs.ch/magasins/gardena

Product list appears dynamically. Find url to get products

https://www.bricoetloisirs.ch/coop/ajax/nextPage/(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272

But when i scrape it by Scrapy it give me empty page

<span class="pageSizeInformation" id="page0" data-page="0" data-pagesize="12">Page: 0 / Size: 12</span>

Here is my code

# -*- coding: utf-8 -*-
import scrapy

from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
            'https://www.bricoetloisirs.ch/coop/ajax/nextPage/(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272'
        ]

    def parse(self, response):
        print response.body

Upvotes: 2

Views: 17328

Answers (3)

Andrew Gowa
Andrew Gowa

Reputation: 129

I solve this.

# -*- coding: utf-8 -*-
import scrapy

from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
            'https://www.bricoetloisirs.ch/magasins/gardena'
        ]

    def parse(self, response):
        for page in xrange(1, 50):
            url = response.url + '/.do?page=%s&_=1473841539272' % page
            yield scrapy.Request(url, callback=self.parse_page)

    def parse_page(self, response):
        print response.body

Upvotes: 3

xanderdin
xanderdin

Reputation: 1

I believe you need to send an additional request just like a browser does. Try to modify your code as follows:

# -*- coding: utf-8 -*-
import scrapy

from scrapy.http import Request
from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
        'https://www.bricoetloisirs.ch/coop/ajax/nextPage/'
    ]

    def parse(self, response):
        request_body = '(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272'
        yield Request(url=response.url, body=request_body, callback=self.parse_page)

    def parse_page(self, response):
        print response.body

Upvotes: 0

Urban48
Urban48

Reputation: 1476

As far as i know websites use JavaScript to make Ajax calls.
when you use scrapy the page's JS dose not load.

You will need to take a look at Selenium for scraping those kind of pages.

Or find out what ajax calls are being made and send them yourself.
check this Can scrapy be used to scrape dynamic content from websites that are using AJAX? may help you as well

Upvotes: 2

Related Questions