KennyPowers
KennyPowers

Reputation: 5015

Scrapy CSV crawling

'm trying to crawl some rows from CSV file using CSVFeedSpider The structure of the file is the next: id | category | price I need to crawl the rows which only have a spefic category "paid" I do the next:

class Outillage_spider(CSVFeedSpider):
name = 'domain.com'
allowed_domains = ['domain.com', 'www.domain.com']
start_urls = ('http://www.domain.com/file.csv',)

delimiter = ';'
headers = ['name', 'category', 'price']

def parse_row(self, response, row):
    categories = ['Bosch','Dolmar','Fein','Hitachi','Karcher','Leman','Makita','SDMO','Ski']
if row['category'] in categories:
        res = {}
        res['name'] = row['name']
        res['price'] = row['price']
        return load_product(res, response)
    else:
  return None

And the next I got:

      File "/home/rolikoff/web/scrapy_projects/local/lib/python2.7/site-packages/Scrapy-0.14.1-py2.7.egg/scrapy/contrib/spiders/feed.py", line 129, in parse_rows
    raise TypeError('You cannot return an "%s" object from a spider' % type(ret).__name__)
exceptions.TypeError: You cannot return an "NoneType" object from a spider

I think it happens when parse_row() returns None. But I'm not sure how to change the fucthion. Do you have any ideas?

Thanks Dmitry

Upvotes: 0

Views: 814

Answers (2)

reclosedev
reclosedev

Reputation: 9522

Try to return empty list or tuple instead None

else:
    return []

And make sure, that load_product returns list, tuple, Item or Request

Upvotes: 1

Arthur Neves
Arthur Neves

Reputation: 12138

As far I am concerned you have to yield fields within the parse_row ! for example, this is an spider that I did for crawling of Podcasts URLs : https://github.com/arthurnn/podcast/blob/master/podcast/spiders/itunes_spider.py

I would remove the else! try this out:

  if row['category'] in categories:
        res = {}
        res['name'] = row['name']
        res['price'] = row['price']
        yield load_product(res, response)

However if you are not using a normal spider! For a CSVFeedSpider read my Edit bellow:

EDIT

In this case you have to return a BaseItem or a list or a tuple! if you look at the implementation of CSVFeedSpider http://dev.scrapy.org/browser/scrapy/contrib/spiders/feed.py?rev=1516 ! you will see that

Upvotes: 1

Related Questions