Ashish Kumar
Ashish Kumar

Reputation: 1

Scraping ASP page with Python and Scrapy

I'm new to python and Scrapy.

For my current project, I'm trying to make a scraper which can pass a query via POST method to a ASP page and parse a <td> value from the output page.

I've written the following code

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        start_urls = ['https://www.bseindia.com/corporates/Forth_Results.aspx']
        download_delay = 1.5

        scrapy.FormRequest.from_response(
            response,
            formdata={
                'ContentPlaceHolder1_SmartSearch_smartSearch': 'TORRENT PHARMACEUTICALS LTD',
                'ctl00$ContentPlaceHolder1$SmartSearch$hdnCode': 500420,
                'ctl00$ContentPlaceHolder1$hf_scripcode': 500420,
                'ctl00$ContentPlaceHolder1$hidCurrentDate': '7/20/2020 12:00:00 AM',
                '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                '__VIEWSTATEGENERATOR': response.css('input#__VIEWSTATEGENERATOR::attr(value)').extract.first(),
                '__EVENTVALIDATION': response.css('input#__EVENTVALIDATION::attr(value)').extract.first()
            },
            callback=self.parse,
        )

    def parse(self, response):
        return response.css('tr.TTrow td[2] ::text').extract()

It is giving me following error:

NameError: name 'response' is not defined

I want to run this scraper in a cronjob with the search field (ContentPlaceHolder1_SmartSearch_smartSearch) passed through a list of names.

Upvotes: 0

Views: 147

Answers (1)

Ryan
Ryan

Reputation: 2183

You don't have access to the response in the start_requests

If you move your code to the parse function, it should work:

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['https://www.bseindia.com/corporates/Forth_Results.aspx']
    download_delay = 1.5

    def parse(self, response):
        formdata = {
           'ContentPlaceHolder1_SmartSearch_smartSearch': 'TORRENT PHARMACEUTICALS LTD',
           'ctl00$ContentPlaceHolder1$SmartSearch$hdnCode': "500420",
           'ctl00$ContentPlaceHolder1$hf_scripcode': "500420",
           'ctl00$ContentPlaceHolder1$hidCurrentDate': '7/20/2020 12:00:00 AM',
           '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
           '__VIEWSTATEGENERATOR': response.css('input#__VIEWSTATEGENERATOR::attr(value)').extract_first(),
           '__EVENTVALIDATION': response.css('input#__EVENTVALIDATION::attr(value)').extract_first()
        }

        return scrapy.FormRequest.from_response(
            response,
            formdata=formdata,
            callback=self.parse_post,
        )

    def parse_post(self, response):
        data = ....

Upvotes: 1

Related Questions