max will
max will

Reputation: 31

How to use python requests with scrapy?

I am trying to use requests to fetch a page then pass the response object to a parser, but I ran into a problem:

def start_requests(self):
    yield self.parse(requests.get(url))
def parse(self, response):
  #pass

builtins.AttributeError: 'generator' object has no attribute 'dont_filter'

Upvotes: 2

Views: 3486

Answers (3)

Ikram Khan Niazi
Ikram Khan Niazi

Reputation: 809

yields return a generator so it iterates over it before the request get's the data you can remove the yield and it should work. I have tested it with sample URL

def start_requests(self):
    self.parse(requests.get(url))
def parse(self, response):
    #pass

Upvotes: 0

JBJ
JBJ

Reputation: 1109

what you need to do is

  1. get the page with python requests and save it to variable different then Scrapy response.

r = requests.get(url)

  1. replace scrapy response body with your python requests text.

response = response.replace(body = r.text)

thats it. Now you have Scrapy response object with all data available from python requests.

Upvotes: 0

Umair Ayub
Umair Ayub

Reputation: 21201

You first need to download the page's resopnse and then convert that string to HtmlResponse object

from scrapy.http import HtmlResponse
resp = requests.get(url)

response = HtmlResponse(url="", body=resp.text, encoding='utf-8')

Upvotes: 3

Related Questions