Reputation: 2583
I'm trying to scrape data from the following URL:
I've been using the scrapy shell command, so I could debug the responses I was getting back from crawling the site.
When I'm using the response.css('#divSideBar div h3').get(default='')
in the terminal, I get an empty response. I ended up going up a level with the following selector... response.css('#divSideBar').get(default='')
and I get a bunch of white space characters \r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t\t\r\n\t\t\t
I can select the elements just fine with the developer tools in Chrome. I checked the network tab in Chrome as well and the content is coming from the URL I'm scraping:
Is there a way to access the contents of the element with the #divSideBar
id?
Upvotes: 0
Views: 105
Reputation: 939
Actually all the data is coming from an dynamic post
request.
What you need to do is send the another FormRequest
with some essential parameter
as per the request which you can see in the inspect Network header tab.
def parse(self, response):
target_headers = {
'Accept' : '*/*',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8,zh-TW;q=0.6,zh;q=0.4',
'Connection' : 'keep-alive',
'Content-Type' : 'application/x-www-form-urlencoded; charset=UTF-8',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Referer' : request.url,
}
yield FormRequest(request.url, formdata={...}, method='POST',
headers=target_headers, meta=response.meta, callback=self.parse_detail)
def parse_detail(self, response):
# crawl your data here
Upvotes: 1