Reputation: 557
the strip method return none if is empty and i would like to know the better way to do it
import scrapy
class GamesSpider(scrapy.Spider):
name = "games"
start_urls = [
'myurl',
]
def parse(self, response):
for game in response.css('ol#products-list li.item'):
yield {
'name': game.css('h2.product-name a::text').extract_first().strip(),
'age': game.css('.list-price ul li:nth-child(1)::text').extract_first().strip(),
'players': game.css('.list-price ul li:nth-child(2)::text').extract_first().strip(),
'duration': game.css('.list-price ul li:nth-child(3)::text').extract_first().strip(),
'dimensions': game.css('.list-price ul li:nth-child(4)::text').extract_first().strip()
}
Upvotes: 0
Views: 1251
Reputation: 28236
The most robust way for handling data like this is using an item loader with an appropriate processor.
It has the added benefit of making your parsing code look less cluttered.
The code to do so might look like this:
import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, Compose
class GameLoader(ItemLoader):
default_output_processor = Compose(TakeFirst(), str.strip)
class GamesSpider(scrapy.Spider):
# spider setup skipped
def parse(self, response):
for game in response.css('ol#products-list li.item'):
loader = GameLoader(item={}, selector=game)
loader.add_css('name', 'h2.product-name a::text')
loader.add_css('age', '.list-price ul li:nth-child(1)::text')
loader.add_css('players', '.list-price ul li:nth-child(2)::text')
loader.add_css('duration', '.list-price ul li:nth-child(3)::text')
loader.add_css('dimensions', '.list-price ul li:nth-child(4)::text')
yield loader.load_item()
Upvotes: 2
Reputation: 1838
Document of Scrapy (https://doc.scrapy.org/en/latest/intro/tutorial.html) said:
using .extract_first() avoids an IndexError and returns None when it doesn’t find any element matching the selection.
So some extracts return None
, not is a string
, so it raised error object no attribute strip()
. You should handle it when None
value is returned.
Upvotes: 1