Jade Cowan
Jade Cowan

Reputation: 2583

Scrapy Spider Returns Only White Space Characters

I'm trying to scrape data from the following URL:

https://www.cheyennecity.org/Jobs.aspx?UniqueId=86&From=Professional-86&CommunityJobs=False&JobID=Senior-Planning-Technician-MPO-933

I've been using the scrapy shell command, so I could debug the responses I was getting back from crawling the site.

When I'm using the response.css('#divSideBar div h3').get(default='') in the terminal, I get an empty response. I ended up going up a level with the following selector... response.css('#divSideBar').get(default='') and I get a bunch of white space characters \r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t

I can select the elements just fine with the developer tools in Chrome. I checked the network tab in Chrome as well and the content is coming from the URL I'm scraping:

enter image description here

Is there a way to access the contents of the element with the #divSideBar id?

Upvotes: 0

Views: 105

Answers (1)

Pankaj
Pankaj

Reputation: 939

Actually all the data is coming from an dynamic post request.

What you need to do is send the another FormRequest with some essential parameter as per the request which you can see in the inspect Network header tab.

def parse(self, response):
   target_headers = {
        'Accept'         : '*/*',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'en-US,en;q=0.8,zh-TW;q=0.6,zh;q=0.4',
        'Connection'     : 'keep-alive',
        'Content-Type'   : 'application/x-www-form-urlencoded; charset=UTF-8',
        'User-Agent'     : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
                          AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
        'Referer'        : request.url,
    }

   yield FormRequest(request.url, formdata={...}, method='POST', 
                     headers=target_headers, meta=response.meta, callback=self.parse_detail)

def parse_detail(self, response):
    # crawl your data here

Upvotes: 1

Related Questions