P.Postrique
P.Postrique

Reputation: 135

ValueError: Missing scheme in request url: /favicon.ico

I try to crawl a seller's page on cdiscount with this code :

# -*- coding: utf-8 -*-
import scrapy
import re
import numbers
from cdiscount_test.items import CdiscountTestItem
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

f = open('item.csv', 'w').close()

class CdiscountsellersspiderSpider(scrapy.Spider):
    name = 'CDiscountSellersSpider'
    allowed_domains = ['cdiscount.com']
    start_urls = ['http://www.cdiscount.com/mpvv-47237-EANTECHNOLOGY.html']

    def parse(self, response):
        for sel in response.xpath('//html/body'):
                item = CdiscountTestItem()
            list_urls = sel.xpath('//@href').extract()
            for url in list_urls:
                item['list_url'] = url
                yield scrapy.Request(url, callback=self.parsefeur, meta={'item': item})

    def parsefeur(item, response):
        item = response.request.meta['item']
#etc other lines...

and I always have an error of type :

raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url:

I found some solutions for the ':h' error on this website but none of them resolved my ':favicon.io' error...

The error in the line 58 doc init.py :

if ':' not in self._url:

But I don't understand this line, o I can't modify it...

Is there anybody who could help me please?

Upvotes: 0

Views: 515

Answers (1)

Tomáš Linhart
Tomáš Linhart

Reputation: 10210

You have to pay attention as there are more elements than just a that contain href attribute (and I assume here that your intention is to get just a elements).

Also, you have to be careful with relative links. Unless you are sure the link is absolute, use response.urljoin() method to get the absolute link (see the documentation).

Upvotes: 1

Related Questions