noel
noel

Reputation: 452

Scrapy returning no results

I'm new to scrapy. I'm trying to scrape Indeed's job site for a project that I'm working on. I am slowly learning the syntax of how to scrape using google chrome inspect and then hitting control-f. I followed along with this tutorial:

https://www.digitalocean.com/community/tutorials/how-to-crawl-a-web-page-with-scrapy-and-python-3

I'm basically stuck trying to get my 16 listings per page. I can see that it normally starts with "

//span[@class="company"]/a/text()

Here is my code up to this point:

import scrapy

class IndeedSpider(scrapy.Spider):
    name='indeed_jobs'
    start_urls = ['https://www.indeed.com/jobs?q=software%20engineer&l=Portland%2C%20OR']

    def parse(self, response):
        SET_SELECTOR = '.jobsearch-SerpJobCard'
        for jobListing in response.css(SET_SELECTOR):
            pass

This is returning nothing. I'd expect 16 rows, so my SET_SELECTOR is incorrect. Help would be really appreciated!

Upvotes: 0

Views: 1030

Answers (1)

malberts
malberts

Reputation: 2536

Your selector works correctly. SET_SELECTOR is not a Scrapy-specific variable, though. You can call it anything, or even put your selector string directly in the function call. It is also not the reason why nothing is returned.

It is returning nothing because you did not instruct it to return anything. In your current code it will find each job section (in the for loop), but then you tell it to do nothing (pass).

Here is an example of it getting the company for each job:

import scrapy

class IndeedSpider(scrapy.Spider):
    name='indeed_jobs'
    start_urls = ['https://www.indeed.com/jobs?q=software%20engineer&l=Portland%2C%20OR']

    def parse(self, response):
        SET_SELECTOR = '.jobsearch-SerpJobCard'
        for jobListing in response.css(SET_SELECTOR):
            # Yield is necessary to return scraped data.
            yield {
                # And here you get a value from each job.
                'company': jobListing.xpath('.//span[@class="company"]/a/text()').get('').strip()
            }

Note the use of .// in the beginning of the XPath. The reason is in the documentation. And I also added a default '' in get() for when that field is missing (docs) so that strip() does not throw an error.

However, I suggest you work through the official Scrapy tutorial first, as the parts you are missing will be explained there: https://docs.scrapy.org/en/latest/intro/tutorial.html

Upvotes: 2

Related Questions