Scrapy Spider Returns Only White Space Characters

Question

I'm trying to scrape data from the following URL:

https://www.cheyennecity.org/Jobs.aspx?UniqueId=86&From=Professional-86&CommunityJobs=False&JobID=Senior-Planning-Technician-MPO-933

I've been using the scrapy shell command, so I could debug the responses I was getting back from crawling the site.

When I'm using the response.css('#divSideBar div h3').get(default='') in the terminal, I get an empty response. I ended up going up a level with the following selector... response.css('#divSideBar').get(default='') and I get a bunch of white space characters

I can select the elements just fine with the developer tools in Chrome. I checked the network tab in Chrome as well and the content is coming from the URL I'm scraping:

Is there a way to access the contents of the element with the #divSideBar id?

Pankaj · Accepted Answer

Actually all the data is coming from an dynamic post request.

What you need to do is send the another FormRequest with some essential parameter as per the request which you can see in the inspect Network header tab.

def parse(self, response):
   target_headers = {
        'Accept'         : '*/*',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'en-US,en;q=0.8,zh-TW;q=0.6,zh;q=0.4',
        'Connection'     : 'keep-alive',
        'Content-Type'   : 'application/x-www-form-urlencoded; charset=UTF-8',
        'User-Agent'     : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
                          AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
        'Referer'        : request.url,
    }

   yield FormRequest(request.url, formdata={...}, method='POST', 
                     headers=target_headers, meta=response.meta, callback=self.parse_detail)

def parse_detail(self, response):
    # crawl your data here

Scrapy Spider Returns Only White Space Characters

Answers (1)

Related Questions