VictoriaV
VictoriaV

Reputation: 3

Scrapy Crawl ValueError

I am new to python and to scrapy. I followed a tutorial to have scrapy crawl quotes.toscrape.com.

I entered in the code exactly how it is in the tutorial, but I keep getting a ValueError: invalid hostname: when I run scrapy crawl quotes. I am doing this in Pycharm on a Mac computer.

I tried doing single and double quotes around the URL in start_urls = []section but that did not fix the error.

This is what the code looks like:

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http: // quotes.toscrape.com /'
    ]

    def parse(self, response):
        title = response.css('title').extract()
        yield {'titletext':title}

It is supposed to be scraping the site for the title.

This is what the error looks like:

2019-11-08 12:52:42 [scrapy.core.engine] INFO: Spider opened
2019-11-08 12:52:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-11-08 12:52:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-11-08 12:52:42 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http:///robots.txt>: invalid hostname: 
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.scraper] ERROR: Error downloading <GET http:///%20//%20quotes.toscrape.com%20/>
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.engine] INFO: Closing spider (finished)

Upvotes: 0

Views: 789

Answers (1)

gangabass
gangabass

Reputation: 10666

Don't use spaces for URLs!

start_urls = [
    'http://quotes.toscrape.com/'
]

Upvotes: 1

Related Questions