user1551211
user1551211

Reputation: 557

Remove whitespace with strip method in python in scrapy script, ways to avoid the none in extract

the strip method return none if is empty and i would like to know the better way to do it

import scrapy

class GamesSpider(scrapy.Spider):
    name = "games"
    start_urls = [
        'myurl',
    ]

    def parse(self, response):
        for game in response.css('ol#products-list li.item'):
            yield {
                'name': game.css('h2.product-name a::text').extract_first().strip(),
                'age': game.css('.list-price ul li:nth-child(1)::text').extract_first().strip(),
                'players': game.css('.list-price ul li:nth-child(2)::text').extract_first().strip(),
                'duration': game.css('.list-price ul li:nth-child(3)::text').extract_first().strip(),
                'dimensions': game.css('.list-price ul li:nth-child(4)::text').extract_first().strip()
            }

Upvotes: 0

Views: 1251

Answers (2)

stranac
stranac

Reputation: 28236

The most robust way for handling data like this is using an item loader with an appropriate processor.
It has the added benefit of making your parsing code look less cluttered.

The code to do so might look like this:

import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, Compose


class GameLoader(ItemLoader):
    default_output_processor = Compose(TakeFirst(), str.strip)


class GamesSpider(scrapy.Spider):
    # spider setup skipped
    def parse(self, response):
        for game in response.css('ol#products-list li.item'):
            loader = GameLoader(item={}, selector=game)
            loader.add_css('name', 'h2.product-name a::text')
            loader.add_css('age', '.list-price ul li:nth-child(1)::text')
            loader.add_css('players', '.list-price ul li:nth-child(2)::text')
            loader.add_css('duration', '.list-price ul li:nth-child(3)::text')
            loader.add_css('dimensions', '.list-price ul li:nth-child(4)::text')
            yield loader.load_item()

Upvotes: 2

Rom
Rom

Reputation: 1838

Document of Scrapy (https://doc.scrapy.org/en/latest/intro/tutorial.html) said:

using .extract_first() avoids an IndexError and returns None when it doesn’t find any element matching the selection.

So some extracts return None, not is a string, so it raised error object no attribute strip(). You should handle it when None value is returned.

Upvotes: 1

Related Questions