Anthony
Anthony

Reputation: 1

Scrape Javascript-generated page using Scrapy

The following page gives access to product details by executing a Javascript request: http://www.ooshop.com/ContentNavigation.aspx?TO_NOEUD_IDMO=N000000013143&FROM_NOEUD_IDMO=N000000013131&TO_NOEUD_IDFO=81080&NOEUD_NIVEAU=2&UNIVERS_INDEX=3

Each product has the following element:

<a id="ctl00_cphC_pn3T1_ctl01_rp_ctl00_ctl00_lbVisu" class="prodimg" href="javascript:__doPostBack('ctl00$cphC$pn3T1$ctl01$rp$ctl00$ctl00$lbVisu','')"><img id="ctl00_cphC_pn3T1_ctl01_rp_ctl00_ctl00_iVisu" title="Visualiser la fiche détail" class="image" onerror="this.src='/Media/images/null.gif';" src="Media/ProdImages/Produit/Vignettes/3270190199359.gif" alt="Dés de jambon" style="height:70px;width:70px;border-width:0px;margin-top:15px"></a>

I try to use FormRequest from Scrapy librairies to crawl these pages but it does not seem to work: <python>

import scrapy
from scrapy.http import FormRequest
from JStest.items import JstestItem

class ooshoptest2(scrapy.Spider):
    name = "ooshoptest2"
    allowed_domains = ["ooshop.com"]
    start_urls = ["http://www.ooshop.com/courses-en-ligne/ContentNavigation.aspx?TO_NOEUD_IDMO=N000000013143&FROM_NOEUD_IDMO=N000000013131&TO_NOEUD_IDFO=81080&NOEUD_NIVEAU=2&UNIVERS_INDEX=3"]

    def parse(self, response):
        URL=response.url
        path='//div[@class="blockInside"]//ul/li/a'
        for balise in response.xpath(path):
            jsrequest = response.urljoin(balise.xpath('@href').extract()[0]
            js="'"+jsrequest[25:-5]+"'"
            data = {'__EVENTTARGET': js,'__EVENTARGUMENT':''}

            yield FormRequest(url=URL,    
                           method='POST',
                           callback=self.parse_level1,
                           formdata=data,
                           dont_filter=True)            

    def parse_level1(self, response):    

        path='//div[@class="popContent"]'
        test=response.xpath(path)[0].extract()
        print test
        item=JstestItem()

        yield item

Does anyone knows how to make this work? Many thanks!

Upvotes: 0

Views: 289

Answers (0)

Related Questions