Reputation: 7
I have written a simple scrape using scrapy, but it keeps returning the first instance of the target data instead of the correct data in each row from each instance of target data. In this case, it returns the first link for all scraped jobs from the Indeed website, instead of the correct link for each job.
I've tried both using (div) and avoiding (.//div) absolute paths, as well as using [0] at the end of the lin. Without, [0], it returns all data from all rows in each cell.
Link to example of source data is; https://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A330K-%C2%A3460K&radius=25&fromage=2&limit=50&sort=date&psf=advsrch
Target data is href="/rc/clk?jk=56e4f5164620b6da&fccid=6920a3604c831610&vjs=3"
<div class="title">
<a target="_blank" id="jl_56e4f5164620b6da" href="/rc/clk?jk=56e4f5164620b6da&fccid=6920a3604c831610&vjs=3" onmousedown="return rclk(this,jobmap[0],1);" onclick=" setRefineByCookie(['radius', 'jobtype', 'salest']); return rclk(this,jobmap[0],true,1);" rel="noopener nofollow" title="Project Manager" class="jobtitle turnstileLink " data-tn-element="jobTitle">
<b>Project</b> <b>Manager</b></a>
def parse(self, response):
titles = response.css('div.jobsearch-SerpJobCard')
items = []
for title in titles:
item = ICcom4Item()
home_url = ("http://www.indeed.co.uk")
item ['role_title_link'] = titles.xpath('div[@class="title"]/a/@href').extract()[0]
items.append(item)
return items
I just need the correct link from each job to appear. All help welcome!
Upvotes: 0
Views: 26
Reputation: 3403
The problem is in the line below:
item ['role_title_link'] = titles.xpath('div[@class="title"]/a/@href').extract()[0]
Instead of titles.xpath
, you should use title.xpath
, like below:
item ['role_title_link'] = title.xpath('div[@class="title"]/a/@href').extract()[0]
Then, your code will scrape the link for each job, as you want.
Upvotes: 1