Such Much Code
Such Much Code

Reputation: 827

Python scrapy not doing anything

I've started learning Python, and I'm loving it so far. I keep looking at different libraries and what not. So I stumbled upon Scrapy and thought I would give it a try. I wanted to get all the links from daylerees colour schemes (from github) and dump them somewhere for a quick access.

So I did this:

import scrapy


class ThemeItem(scrapy.Item):
    name = scrapy.Field()
    link = scrapy.Field()


class ThemeSpider(scrapy.Spider):
    name = 'themespider'
    start_urls = ['https://github.com/daylerees/colour-schemes/tree/master/jetbrains']

    def parse(self, response):
        for sel in response.xpath('//a[@class="js-directory-link"]'):
            url = ThemeItem()
            url['name'] = sel.xpath('text()')
            url['link'] = sel.xpath('@href')

            yield url

And it is not outputting anything at all. Any guidance would be much appreciated.


I'm running it like this: scrapy runspider spider.py

Upvotes: 1

Views: 105

Answers (1)

alecxe
alecxe

Reputation: 473873

The elements containing the js-directory-link class have also other classes, example:

<a href="/daylerees/colour-schemes/tree/master/jetbrains/contrast" class="js-directory-link js-navigation-open" id="c8fd07f040a8f2dc85f5b2d3804ea3db-6b332f6820ec47d7ade641dbf72108b025b10440" title="contrast">contrast</a>

You need to use a partial class attribute match via contains():

//a[contains(@class, "js-directory-link")]

Or, you may use CSS selectors:

for sel in response.css('a.js-directory-link'):

Though I would really think about using github API instead.

Upvotes: 1

Related Questions