Scrapy unable to parse links

Question

Unable to collect links properly. Keep getting partial links from the page. How to get my parser to work?

import scrapy


class GlobaldriveruSpider(scrapy.Spider):
    name = 'globaldriveru'
    allowed_domains = ['globaldrive.ru']
    start_urls = ['https://globaldrive.ru/moskva/motory/?items_per_page=500']

    def parse(self, response):
        links = response.xpath('//div[@class="ty-grid-list__item-name"]/a/@href').get()
        for link in links:
            yield scrapy.Request(response.urljoin(link), callback=self.parse_products, dont_filter=True)
            #yield scrapy.Request(link, callback=self.parse_products, dont_filter=True)

    def parse_products(self, response):
     #       for parse_products in response.xpath('//div[contains(@class, "container-fluid  products_block_page")]'):
        item = dict()
        item['title'] = response.xpath('//h1[@class="ty-product-block-title"]/text()').extract_first()
        yield item

Here is some output log

[]
2019-04-29 16:21:12 [scrapy.core.engine] INFO: Spider opened
2019-04-29 16:21:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-04-29 16:21:12 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2019-04-29 16:21:13 [scrapy.core.engine] DEBUG: Crawled (200)  (referer: None)
2019-04-29 16:21:17 [scrapy.core.engine] DEBUG: Crawled (200)  (referer: None)
2019-04-29 16:21:17 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to  from 
2019-04-29 16:21:17 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to  from 
2019-04-29 16:21:18 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to  from 
2019-04-29 16:21:18 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to  from

vezunchik · Accepted Answer

Replace .get() with .extract() in parse function, right now you're iterating through one link letter by letter, but you need just to extract all the links.

def parse(self, response):
    links = response.xpath('//div[@class="ty-grid-list__item-name"]/a/@href').extract()  # <- here
    for link in links:
        yield scrapy.Request(response.urljoin(link), self.parse_products)

Scrapy unable to parse links

Answers (1)

Related Questions