user6855378
user6855378

Reputation:

Scrapy JSON output - values empty

I would like to crawl a set of web pages using scrapy. However, when I try to write some values into the json file, those fields don't show up.

Here is my code:

import scrapy

class LLPubs (scrapy.Spider):
    name = "linlinks"
    start_urls = [
        'http://www.linnaeuslink.org/records/record/1',
        'http://www.linnaeuslink.org/records/record/2',
    ]

    def parse(self, response):
        for container in response.css('div.item'):
            yield {
                'text': container.css('div.field.soulsbyNo .value span::text').extract(),
                'uniformtitle': container.css('div.field.uniformTitle .value span::text').extract(),
                'title': container.css('div.field.title .value span::text').extract(),
                'opac': container.css('div.field.localControlNo .value span::text').extract(),
                'url': container.css('div#digitalLinks li a').extract(),
                'partner': container.css('div.logoContainer  img:first-child').xpath('@src').extract(),
                }

And an example of my output:

{
"text": ["Soulsby no. 46(1)"], 
"uniformtitle": ["Systema naturae"], 
"title": ["Caroli Linn\u00e6i ... Systema natur\u00e6\nin quo natur\u00e6 regna tria, secundum classes, ordines, genera, species, systematice proponuntur."], 
"opac": ["002178079"], 
"url": [], 
"partner": []
},

I am hoping I am doing something silly and easy to fix! Both of the paths I am using for "url" and "partner" were working from here:

scrapy shell 'http://www.linnaeuslink.org/records/record/1'

So, I just don't know what I am missing.

Oh, and exporting to json by using this command for now:

scrapy crawl linlinks -o quotes.json

Thanks for your help!

Upvotes: 0

Views: 640

Answers (1)

Wilfredo
Wilfredo

Reputation: 1548

The problem seems to be that those selectors are not "findable" inside any div.item you probably have validated them without the response.css('div.item') to replicate what you used in the shell just replace the container.css by response.css for the url and partner keys.

Upvotes: 1

Related Questions