Select results one by one in scrapy

Question

I downloaded the source of one page from Indeed and I'm trying to get all the job titles from there, for that I'm using this xpath:

response.xpath('//*[@class="  row  result"]//*[@class="jobtitle"]//text()').extract()

The issue is that the results aren't in one line hence and getting this result:

[u'
    ',
 u'Data',
 u' ',
 u'Scientist',
 u' Experto SQL con conocimiento en R',
 u'
    ',
 u'
    ',
 u'Data',
 u' Analytic con Python',
 u'
    ',
 u'
    ',
 u'Data',
 u' Analytic con R',

Which is problematic to map with the rest of the data, what I want is to select process the jobs one by one, something similar to extract_first()

response.xpath('//*[@class="  row  result"]').extract_first()

But for any given index and with the option to keep processing the data. I tried this:

current_job = response.xpath('//*[@class="  row  result"]').extract_first()
current_job = TextResponse(url='',body=current_job,encoding='utf-8')

But it only works for the first result and it doesn't look like a pythonic approach to me.

furas · Accepted Answer

First I would get only a (without text() and extract()) and then I would use for to use text() and extract() with every a separatelly, and join() to concatenate elements to string with title.

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    start_urls = ['https://www.indeed.cl/trabajo?q=Data%20scientist&l=']

    def parse(self, response):
        print('url:', response.url)

        results = response.xpath('//h2[@class="jobtitle"]/a')
        print('number:', len(results))

        for item in results:
            title = ''.join(item.xpath('.//text()').extract())
            print('title:', title)

# --- it runs without project and saves in `output.csv` ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(MySpider)
c.start()

Result:

number: 10
title: Data Scientist
title: CONSULTOR DATA SCIENCE SANTIAGO DE CHILE
title: Líder Análisis de Datos MCoE Minerals Americas
title: Ingeniero Inteligencia Mercado, BI
title: Ingeniero Inteligencia de Mercado, Business Intelligence
title: Data Scientist
title: Data Scientist
title: Data Scientist (Machine Learning)
title: Data Scientist / Ml Scientist
title: Young Professional - Spanish LatAm

Select results one by one in scrapy

Answers (2)

Related Questions