Sayed Gouda
Sayed Gouda

Reputation: 615

scrapy python typeerror positional argument

This is my first scrapy project, and the purpose is to scrape this page “http://books.toscrape.com/”.

The page contains a several link for books.

The idea is to scrape the title of the books.

Here is the code:

import scrapy


class SpiderSpider(scrapy.Spider):
    name = 'spider'
    allowed_domains = ['http://books.toscrape.com/']
    start_urls = ['http://http://books.toscrape.com//']

    def parse(self, response):
        all_books = response.xpath('//article')
        for book in all_books:
            title = book.xpath('.//h3/a/@title').extract()
            print(title)


SpiderSpider().parse()

Here is the error:

Traceback (most recent call last):

 File "C:/Users/Sayed/PycharmProjects/books/books/spiders/spider.py", line 17, in <module>
    SpiderSpider().parse()

TypeError: parse() missing 1 required positional argument: 'response

Upvotes: 1

Views: 250

Answers (3)

Alok Tripathi
Alok Tripathi

Reputation: 1124

You should not call parse() method directly. It is called automatically when Scrapy gets a response. Instead of that use Command line runner. Follow this link for help : Command Line Runner

Scrapy Project already has a template, Instead of doing like this use that, that will be easy.

Upvotes: 0

Tibebes. M
Tibebes. M

Reputation: 7558

You should use the following command to execute the script:

scrapy runspider <script_name>.py

(after you remove SpiderSpider().parse() line)

Upvotes: 1

EnriqueBet
EnriqueBet

Reputation: 1473

I see that there are two mistakes in your code:

  1. You are missing the __init__ function from your scrappy.Spider class
  2. You are not supposed to call your spider like that, SpiderSpider().parse, as far as I know, the way to do it is to run:
$ scrapy crawl [yourSpiderName]

For this particular case:

$ scrapy crawl spider

When doing this, be sure that you are in the same folder path as your scrapy.cfg file.

For the first point, the correct way for your spider code is:

import scrapy

class SpiderSpider(scrapy.Spider):
    name = 'spider'
    def __init__(self, *args, **kwargs):
        super(SpiderSpider,self).__init__(*args, **kwargs)
        self.allowed_domains = ['http://books.toscrape.com/']
        self.start_urls = ['http://http://books.toscrape.com//']

    def parse(self, response):
        all_books = response.xpath('//article')
        for book in all_books:
            title = book.xpath('.//h3/a/@title').extract()
            print(title)

Hope this point you in the right direction. Also, I don't know how did you create this scrapy project, but scrapy already comes with templates for your projects, making easier to develop fast and reliable solutions. To create a project template use:

$ scrapy startproject [NameOfYourProject]

To generate a new spider do:

$ cd [NameOfYourProject]
$ scrapy genspider [yourSpiderName]

Please feel free to ask if you have any questions! :D

Upvotes: 1

Related Questions