Reputation: 57
I am building a spider with scrapy, I want to access in every item in a list and then scrape all the data inside each link. but when I run the spider it doesn´t scrape the data. What I am missing?
from ..items import JobscraperItem
from scrapy.linkextractors import LinkExtractor
class JobscraperSpider(scrapy.Spider):
name ='jobspider'
start_urls = ['https://cccc/bolsa/ofertas?oferta=&lugar=&categoria=']
def parse(self, response):
job_detail = response.xpath('//div[@class="list"]/div/a')
yield from response.follow_all(job_detail, self.parse_jobspider)
def parse(self, response):
items = JobscraperItem()
job_title = response.xpath('//h1/text()').extract()
company = response.xpath('//h2/b/text()').extract()
company_url = response.xpath('//div[@class="pull-left"]/a/text()').extract()
description = response.xpath('//div[@class="aviso"]/text()').extract()
salary = response.xpath('//div[@id="aviso"]/p[1]/text()').extract()
city = response.xpath('//div[@id="aviso"]/p[2]/text()').extract()
district = response.xpath('//div[@id="aviso"]/p[5]/text()').extract()
publication_date = response.xpath('//div[@id="publicado"]/text()').extract()
apply = response.xpath('//p[@class="text-center"]/b/text()').extract()
job_type = response.xpath('//div[@id="resumen"]/p[3]/text()').extract()
items['job_title'] = job_title
items['company'] = company
items['company_url'] = company_url
items['description'] = description
items['salary'] = salary
items['city'] = city
items['district'] = district
items['publication_date'] = publication_date
items['apply'] = apply
items['job_type'] = job_type
yield items```
Upvotes: 0
Views: 102
Reputation: 57
rules = (
Rule(LinkExtractor(allow=('/bolsa/166',)), follow=True, callback='parse_item'),
)
I resolve this adding this code to access in every link and scrape the data inside
Upvotes: 0
Reputation: 448
From what I can see, one of the issues is that you are creating two functions called parse()
. Since you are using a self.parse_jobspider
in your first parse function, I'm guessing that your second parse function is named incorrectly.
Also, are you sure that the URL in the start_urls is correct? https://cccc/bolsa/ofertas?oferta=&lugar=&categoria=
doesn't direct to anywhere which would also explain why data isn't being scraped.
Upvotes: 1