Python scrapy not doing anything

Question

I've started learning Python, and I'm loving it so far. I keep looking at different libraries and what not. So I stumbled upon Scrapy and thought I would give it a try. I wanted to get all the links from daylerees colour schemes (from github) and dump them somewhere for a quick access.

So I did this:

import scrapy


class ThemeItem(scrapy.Item):
    name = scrapy.Field()
    link = scrapy.Field()


class ThemeSpider(scrapy.Spider):
    name = 'themespider'
    start_urls = ['https://github.com/daylerees/colour-schemes/tree/master/jetbrains']

    def parse(self, response):
        for sel in response.xpath('//a[@class="js-directory-link"]'):
            url = ThemeItem()
            url['name'] = sel.xpath('text()')
            url['link'] = sel.xpath('@href')

            yield url

And it is not outputting anything at all. Any guidance would be much appreciated.

I'm running it like this: scrapy runspider spider.py

alecxe · Accepted Answer

The elements containing the js-directory-link class have also other classes, example:

contrast

You need to use a partial class attribute match via contains():

//a[contains(@class, "js-directory-link")]

Or, you may use CSS selectors:

for sel in response.css('a.js-directory-link'):

Though I would really think about using github API instead.

Python scrapy not doing anything

Answers (1)

Related Questions