Tribic
Tribic

Reputation: 108

Scrapy: How to get a list of urls and loop over them afterwards

i'm new to python and scrapy, watched a few udemy and youtube tutorials and now trying my first own example. I know howto loop, if there's a next-button. But in my case, there is none.

Here's my code, working on one of the urls, but the start url needs to be changed later:

class Heroes1JobSpider(scrapy.Spider):
name = 'heroes1_job'

# where to extract
allowed_domains = ['icy-veins.com']
start_urls = ['https://www.icy-veins.com/heroes/alarak-build-guide']

def parse(self, response):
    #what to extract
    hero_names = response.xpath('//span[@class="page_breadcrumbs_item"]/text()').extract()
    hero_buildss = response.xpath('//h3[@class="toc_no_parsing"]/text()').extract()
    hero_buildskillss = response.xpath('//span[@class="heroes_build_talent_tier_visual"]').extract()

    for item in zip(hero_names, hero_buildss, hero_buildskillss):
        new_item = Heroes1Item()

        new_item['hero_name'] = item[0]
        new_item['hero_builds'] = item[1]
        new_item['hero_buildskills'] = item[2]


        yield new_item

But this is only one Hero, and i want about 90 of them. Each url depends on the hero name. I can get the list of urls by this command:

    start_urls = ['https://www.icy-veins.com/heroes/assassin-hero-guides')

    ...

  response.xpath('//div[@class="nav_content_block_entry_heroes_hero"]/a/@href').extract()

But i don´t know howto store this list in order to get the parse function to loop over them.

thanks in advance!

Upvotes: 1

Views: 1617

Answers (1)

vezunchik
vezunchik

Reputation: 3717

Is it critical to parse them in parse function? You can parse your hero list in one function and then iterate this list to scrape hero data it in this way:

from scrapy import Request
...

start_urls = ['https://www.icy-veins.com/heroes/assassin-hero-guides')

def parse(self, response):
    heroes_xpath = '//div[@class="nav_content_block_entry_heroes_hero"]/a/@href'
    for link in response.xpath(heroes_xpath).extract():
        yield Request(response.urljoin(link), self.parse_hero)

def parse_hero(self, response):
    # copying your method here
    hero_names = response.xpath('//span[@class="page_breadcrumbs_item"]/text()').extract()
    hero_buildss = response.xpath('//h3[@class="toc_no_parsing"]/text()').extract()
    hero_buildskillss = response.xpath('//span[@class="heroes_build_talent_tier_visual"]').extract()

    for item in zip(hero_names, hero_buildss, hero_buildskillss):
        new_item = Heroes1Item()
        new_item['hero_name'] = item[0]
        new_item['hero_builds'] = item[1]
        new_item['hero_buildskills'] = item[2]
        yield new_item

Upvotes: 2

Related Questions