Reputation: 154
I've written a script in python using scrapy to get the links from response after making a post request to a certain url. The links are perfectly coming through when I try with the following script.
Working one:
import scrapy
from scrapy.crawler import CrawlerProcess
class AftnetSpider(scrapy.Spider):
name = "aftnet"
base_url = "http://www.aftnet.be/MyAFT/Clubs/SearchClubs"
def start_requests(self):
yield scrapy.FormRequest(self.base_url,callback=self.parse,formdata={'regions':'1,3,4,6'})
def parse(self,response):
for items in response.css("dl.club-item"):
for item in items.css("dd a[data-toggle='popover']::attr('data-url')").getall():
yield {"result_url":response.urljoin(item)}
if __name__ == "__main__":
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(AftnetSpider)
c.start()
However, my intention is to achieve the same using list comprehension but I'm getting some error.
Using list comprehension:
def parse(self,response):
return [response.urljoin(item) for items in response.css("dl.club-item") for item in items.css("dd a[data-toggle='popover']::attr('data-url')").getall()]
I get the following error:
2019-03-08 12:45:44 [scrapy.core.scraper] ERROR: Spider must return Request, BaseItem, dict or None, got 'str' in <POST http://www.aftnet.be/MyAFT/Clubs/SearchClubs>
How can I get some links using list comprehension within scrapy?
Upvotes: 0
Views: 149
Reputation: 16952
Your generator with a loop is returning a single dict
on every call:
yield {"result_url":response.urljoin(item)}
But your list comprehension is returning a list of strings. I don't know why you want a list comprehension here: your generator is much easier to understand (as shown by the fact that you have got it to work and are having trouble with the list comprehension) but if you insist on doing it, what you need is a list of dicts
not strings, something like
return [{"result_url":response.urljoin(item)} for items in response.css("dl.club-item") for item in items.css("dd a[data-toggle='popover']::attr('data-url')").getall()]
But please don't do that. Remember that readability counts. Your generator is readable, your one-liner isn't.
Upvotes: 1