Reputation: 23
In my spider i have two pipelines, One for images one for files,
in the spider i check for a result of images like so
if len(self.img_) >= 0:
item['image_urls'] = self.img_
and also do the same for files.
if len(self.result) >= 0:
item['file_urls'] = self.result
my settings are like so for the Pipelines
'ScrawlerScraper.images.ImgPipeline': 1,
'ScrawlerScraper.fld_pipeline.FilesPipeline': 2
My process item in the image pipeline that comes first is like so
def process_item(self, item, spider):
try:
request = [x for x in item['image_urls']]
for x in request:
dfd = spider.crawler.engine.download(x, spider)
dfd.addBoth(self.return_item, item)
return dfd
except logging.info('No images'):
return item
This is the same for the files pipeline
def process_item(self, item, spider):
try:
request = [x for x in item['file_urls']]
for x in request:
dfd = spider.crawler.engine.download(x, spider)
dfd.addBoth(self.return_item, item)
return dfd
except logging.error('No Files Problem'):
return item
I do this because i am getting multiple images and can only yield the item once in the spider.
What i dont understand is why i am getting an error in the second pipeline
fld_pipeline.py", line 23, in process_item
request = [x for x in item['file_urls']]
TypeError: 'NoneType' object has no attribute '__getitem__'
if there is no item['file_urls']
the pipeline should not even execute.
Im totaly lost as to what is happening.
Thanks for any help and i hope my question is not to badly formated..
This is where i check for a file that exits and return item if not. maybe i have it wrong where i am in a loop and trying to return the item. but it comes back to the next pipeline as none
for x in item['image_urls']:
try:
a = urlparse.urlparse(x)
except TypeError:
logging.info('No images')
return item
file_name = os.path.basename(a.path)
try:
u = sock.Worker().get_url(x)
except TypeError:
logging.info('No images')
return item
if u:
try:
os.path.exists(settings.get('IMAGES_STORE') + file_name)
logging.info("Image Already Downloaded: %s " % a.path)
item['image_urls'].append(settings.get('IMAGES_STORE') + file_name)
return item
except TypeError:
data = u.read()
with open(settings.get('IMAGES_STORE') + file_name, 'wb') as code:
code.write(data)
logging.info("Downloading (%s)" % a.path,)
item['image_urls'].append(settings.get('IMAGES_STORE') + file_name)
return item
Upvotes: 0
Views: 1603
Reputation: 855
Your try/except clause isn't right. You must list error types after except instead of saying it what to log. Try this:
try:
request = [x for x in item['image_urls']]
for x in request:
dfd = spider.crawler.engine.download(x, spider)
dfd.addBoth(self.return_item, item)
return dfd
except TypeError:
logging.info('No images')
return item
Upvotes: 2