federico Guastadisegni
federico Guastadisegni

Reputation: 151

Can't get any data with Scrapy

I'm trying to scrape the following site:

http://search.scielo.org/?q=science&lang=pt&count=50&from=0&output=site&sort=&format=summary&fb=&page=1

with this code:

def parse_web11(self, response): 

    for publication in response.css('div.content > div.searchForm > div.container resultBlock > div.col-md-9 col-sm-8 > div.results > div.item'):

        author = publication.xpath('./div[@class="col-md-11 col-sm-10 col-xs-11"]/div[@class="line authors"]/a/text()').extract_first()
        title = publication.xpath('./div[@class="col-md-11 col-sm-10 col-xs-11"]/div[@class="line"]/a/strong[@class="title"]/text()').extract_first()
        doi = publication.css("strong[@class='DOIResults']::text()").extract_first()
        link = publication.xpath('./div[@class="col-md-11 col-sm-10 col-xs-11"]/div[@class="line"]/a/@href').extract_first()
        year = publication.xpath('./div[@class="col-md-11 col-sm-10 col-xs-11"]/div[@class="line source"]/span/text()').re_first(r'\d\d\d\d')


        print(author,title,doi,link,year)
        raw_input()

but i get no result.

Upvotes: 0

Views: 102

Answers (1)

alecxe
alecxe

Reputation: 473753

Simplify the publication selector to just:

div.results > div.item

Demo from the shell:

$ scrapy shell "http://search.scielo.org/?q=science&lang=pt&count=50&from=0&output=site&sort=&format=summary&fb=&page=1"
>>> for publication in response.css('div.results > div.item'):
...     print(publication.xpath('.//a/strong[@class="title"]/text()').extract_first())

Ensaio sobre os nós das redes logísticas
Segurança de pedestres em rotatórias urbanas
...
Comparação do processo de categorização de documentos utilizando palavras-chave e citações em um domínio de conhecimento restrito
A ciência nas regiões brasileiras: evolução da produção e das redes de colaboração científica

Upvotes: 2

Related Questions