Reputation: 97
I am having some trouble trying to figure out why my secondary function is failing to follow through to the new link and then output data. the parse
function works just fine. It's when it calls back to parse_puppy
that nothing happens. When I check the json output I see that everything from puppy
was successfully scraped, but there's nothing from parse_puppy
.
On line 28, if I change the method to follow
I get results, but it's the same result about dozen times.
Code:
import scrapy
from scrapy.cmdline import execute
class Spider(scrapy.Spider):
name = "puppyDetails"
def start_requests(self):
urls = ['https://ws.petango.com/webservices/adoptablesearch/wsAdoptableAnimals.aspx?species=Dog&gender=A&agegroup=UnderYear&location=&site=&onhold=A&orderby=name&colnum=3&css=http://ws.petango.com/WebServices/adoptablesearch/css/styles.css&authkey=io53xfw8b0k2ocet3yb83666507n2168taf513lkxrqe681kf8&recAmount=&detailsInPopup=No&featuredPet=Include&stageID=&wmode=opaque']
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
# GRAB ALL TOPICAL PUPPY DETAILS
for animal in response.css("div.list-animal-info-block"):
yield {
'puppy_name': animal.css('div.list-animal-name a::text').get(),
'puppy_id': animal.css('div.list-animal-id::text').get(),
'puppy_sex': animal.css('div.list-animal-sexSN::text').get(),
'puppy_breed': animal.css('div.list-animal-breed::text').get(),
'puppy_age': animal.css('div.list-animal-age::text').get(),
'puppy_link': animal.css('div.list-animal-name a::attr(href)').get()
}
# DIVE INTO DETAILS PAGE
detail_page = response.css('div.list-animal-name a::attr(href)').get()
self.logger.info('get puppy details')
# GO TO THE PUPPY DETAILS
yield response.follow_all(detail_page, callback=self.parse_puppy)
def parse_puppy(self, response):
# GRAB PUPPY DETAILS
for puppyDetails in response.xpath('//*[@class="detail-table"]//tr'):
yield {
'puppy_id': puppyDetails.xpath('//*[@id="lblID"]/text()').extract(),
'puppy_status': puppyDetails.xpath('//*[@id="lblStage"]/text()').extract(),
'puppy_intake_date': puppyDetails.xpath('//*[@id="lblIntakeDate"]/text()').extract()
}
execute(['scrapy','crawl','puppyDetails'])
Error:
ERROR: Spider must return Request, BaseItem, dict or None, got 'generator' in <GET https://ws.petango.com/webservices/adoptablesearch/wsAdoptableAnimals.aspx?species=Dog&gender=A&agegroup=UnderYear&location=&site=&onhold=A&orderby=name&colnum=3&css=http://ws.petango.com/WebServices/adoptablesearch/css/styles.css&authkey=io53xfw8b0k2ocet3yb83666507n2168taf513lkxrqe681kf8&recAmount=&detailsInPopup=No&featuredPet=Include&stageID=&wmode=opaque>
Upvotes: 0
Views: 146
Reputation: 488
The line should be
yield from response.follow_all(detail_page, callback=self.parse_puppy)
Upvotes: 1