Reputation: 615
This is my first scrapy project, and the purpose is to scrape this page “http://books.toscrape.com/”.
The page contains a several link for books.
The idea is to scrape the title of the books.
Here is the code:
import scrapy
class SpiderSpider(scrapy.Spider):
name = 'spider'
allowed_domains = ['http://books.toscrape.com/']
start_urls = ['http://http://books.toscrape.com//']
def parse(self, response):
all_books = response.xpath('//article')
for book in all_books:
title = book.xpath('.//h3/a/@title').extract()
print(title)
SpiderSpider().parse()
Here is the error:
Traceback (most recent call last):
File "C:/Users/Sayed/PycharmProjects/books/books/spiders/spider.py", line 17, in <module>
SpiderSpider().parse()
TypeError: parse() missing 1 required positional argument: 'response
Upvotes: 1
Views: 250
Reputation: 1124
You should not call parse() method directly. It is called automatically when Scrapy gets a response. Instead of that use Command line runner. Follow this link for help : Command Line Runner
Scrapy Project already has a template, Instead of doing like this use that, that will be easy.
Upvotes: 0
Reputation: 7558
You should use the following command to execute the script:
scrapy runspider <script_name>.py
(after you remove SpiderSpider().parse()
line)
Upvotes: 1
Reputation: 1473
I see that there are two mistakes in your code:
__init__
function from your scrappy.Spider
classSpiderSpider().parse
, as far as I know, the way to do it is to run: $ scrapy crawl [yourSpiderName]
For this particular case:
$ scrapy crawl spider
When doing this, be sure that you are in the same folder path as your scrapy.cfg
file.
For the first point, the correct way for your spider code is:
import scrapy
class SpiderSpider(scrapy.Spider):
name = 'spider'
def __init__(self, *args, **kwargs):
super(SpiderSpider,self).__init__(*args, **kwargs)
self.allowed_domains = ['http://books.toscrape.com/']
self.start_urls = ['http://http://books.toscrape.com//']
def parse(self, response):
all_books = response.xpath('//article')
for book in all_books:
title = book.xpath('.//h3/a/@title').extract()
print(title)
Hope this point you in the right direction. Also, I don't know how did you create this scrapy project, but scrapy already comes with templates for your projects, making easier to develop fast and reliable solutions. To create a project template use:
$ scrapy startproject [NameOfYourProject]
To generate a new spider do:
$ cd [NameOfYourProject]
$ scrapy genspider [yourSpiderName]
Please feel free to ask if you have any questions! :D
Upvotes: 1