Reputation: 452
I'm new to scrapy. I'm trying to scrape Indeed's job site for a project that I'm working on. I am slowly learning the syntax of how to scrape using google chrome inspect and then hitting control-f. I followed along with this tutorial:
https://www.digitalocean.com/community/tutorials/how-to-crawl-a-web-page-with-scrapy-and-python-3
I'm basically stuck trying to get my 16 listings per page. I can see that it normally starts with "
//span[@class="company"]/a/text()
Here is my code up to this point:
import scrapy
class IndeedSpider(scrapy.Spider):
name='indeed_jobs'
start_urls = ['https://www.indeed.com/jobs?q=software%20engineer&l=Portland%2C%20OR']
def parse(self, response):
SET_SELECTOR = '.jobsearch-SerpJobCard'
for jobListing in response.css(SET_SELECTOR):
pass
This is returning nothing. I'd expect 16 rows, so my SET_SELECTOR is incorrect. Help would be really appreciated!
Upvotes: 0
Views: 1030
Reputation: 2536
Your selector works correctly. SET_SELECTOR
is not a Scrapy-specific variable, though. You can call it anything, or even put your selector string directly in the function call. It is also not the reason why nothing is returned.
It is returning nothing because you did not instruct it to return anything. In your current code it will find each job section (in the for
loop), but then you tell it to do nothing (pass
).
Here is an example of it getting the company for each job:
import scrapy
class IndeedSpider(scrapy.Spider):
name='indeed_jobs'
start_urls = ['https://www.indeed.com/jobs?q=software%20engineer&l=Portland%2C%20OR']
def parse(self, response):
SET_SELECTOR = '.jobsearch-SerpJobCard'
for jobListing in response.css(SET_SELECTOR):
# Yield is necessary to return scraped data.
yield {
# And here you get a value from each job.
'company': jobListing.xpath('.//span[@class="company"]/a/text()').get('').strip()
}
Note the use of .//
in the beginning of the XPath. The reason is in the documentation. And I also added a default ''
in get()
for when that field is missing (docs) so that strip()
does not throw an error.
However, I suggest you work through the official Scrapy tutorial first, as the parts you are missing will be explained there: https://docs.scrapy.org/en/latest/intro/tutorial.html
Upvotes: 2