Jamwg
Jamwg

Reputation: 7

Scrapy returning same first row data in each row instead of separate data for each row

I have written a simple scrape using scrapy, but it keeps returning the first instance of the target data instead of the correct data in each row from each instance of target data. In this case, it returns the first link for all scraped jobs from the Indeed website, instead of the correct link for each job.

I've tried both using (div) and avoiding (.//div) absolute paths, as well as using [0] at the end of the lin. Without, [0], it returns all data from all rows in each cell.

Link to example of source data is; https://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A330K-%C2%A3460K&radius=25&fromage=2&limit=50&sort=date&psf=advsrch

Target data is href="/rc/clk?jk=56e4f5164620b6da&fccid=6920a3604c831610&vjs=3"

Target data from page

<div class="title">
    <a target="_blank" id="jl_56e4f5164620b6da" href="/rc/clk?jk=56e4f5164620b6da&amp;fccid=6920a3604c831610&amp;vjs=3" onmousedown="return rclk(this,jobmap[0],1);" onclick=" setRefineByCookie(['radius', 'jobtype', 'salest']); return rclk(this,jobmap[0],true,1);" rel="noopener nofollow" title="Project Manager" class="jobtitle turnstileLink " data-tn-element="jobTitle">
        <b>Project</b> <b>Manager</b></a>

Here's my code

def parse(self, response):
    titles = response.css('div.jobsearch-SerpJobCard')
    items = []
    for title in titles:
        item = ICcom4Item()
        home_url = ("http://www.indeed.co.uk")
        item ['role_title_link'] = titles.xpath('div[@class="title"]/a/@href').extract()[0] 

        items.append(item)
    return items

I just need the correct link from each job to appear. All help welcome!

Upvotes: 0

Views: 26

Answers (1)

reisdev
reisdev

Reputation: 3403

The problem is in the line below:

item ['role_title_link'] = titles.xpath('div[@class="title"]/a/@href').extract()[0] 

Instead of titles.xpath, you should use title.xpath, like below:

item ['role_title_link'] = title.xpath('div[@class="title"]/a/@href').extract()[0] 

Then, your code will scrape the link for each job, as you want.

Upvotes: 1

Related Questions