user2521546
user2521546

Reputation: 23

scrapy pipeline returns 'NoneType' object has no attribute '__getitem__'

In my spider i have two pipelines, One for images one for files,

in the spider i check for a result of images like so

if len(self.img_) >= 0:
    item['image_urls'] = self.img_

and also do the same for files.

if len(self.result) >= 0:
    item['file_urls'] = self.result

my settings are like so for the Pipelines

'ScrawlerScraper.images.ImgPipeline': 1,
'ScrawlerScraper.fld_pipeline.FilesPipeline': 2

My process item in the image pipeline that comes first is like so

def process_item(self, item, spider):
        try:
            request = [x for x in item['image_urls']]
            for x in request:
                dfd = spider.crawler.engine.download(x, spider)
                dfd.addBoth(self.return_item, item)
                return dfd
        except logging.info('No images'): 
            return item

This is the same for the files pipeline

def process_item(self, item, spider):
        try:
            request = [x for x in item['file_urls']]
            for x in request:
                dfd = spider.crawler.engine.download(x, spider)
                dfd.addBoth(self.return_item, item)
                return dfd
        except logging.error('No Files Problem'):
            return item

I do this because i am getting multiple images and can only yield the item once in the spider.

What i dont understand is why i am getting an error in the second pipeline

fld_pipeline.py", line 23, in process_item
request = [x for x in item['file_urls']]
TypeError: 'NoneType' object has no attribute '__getitem__'

if there is no item['file_urls'] the pipeline should not even execute.

Im totaly lost as to what is happening.

Thanks for any help and i hope my question is not to badly formated..

This is where i check for a file that exits and return item if not. maybe i have it wrong where i am in a loop and trying to return the item. but it comes back to the next pipeline as none

 for x in item['image_urls']:
        try:
            a = urlparse.urlparse(x)
        except TypeError:
            logging.info('No images')
            return item
        file_name = os.path.basename(a.path)
        try:
            u = sock.Worker().get_url(x)
        except TypeError:
            logging.info('No images')
            return item
        if u:
            try:
                os.path.exists(settings.get('IMAGES_STORE') + file_name)
                logging.info("Image Already Downloaded: %s " % a.path)
                item['image_urls'].append(settings.get('IMAGES_STORE') + file_name)
                return item
            except TypeError:
                data = u.read()
                with open(settings.get('IMAGES_STORE') + file_name, 'wb') as code:
                    code.write(data)
                    logging.info("Downloading (%s)" % a.path,)
                item['image_urls'].append(settings.get('IMAGES_STORE') + file_name)
                return item

Upvotes: 0

Views: 1603

Answers (1)

Alex K.
Alex K.

Reputation: 855

Your try/except clause isn't right. You must list error types after except instead of saying it what to log. Try this:

    try:
        request = [x for x in item['image_urls']]
        for x in request:
            dfd = spider.crawler.engine.download(x, spider)
            dfd.addBoth(self.return_item, item)
            return dfd
    except TypeError:
        logging.info('No images') 
        return item

Upvotes: 2

Related Questions