Reputation: 1007
I have this code
class MyImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
and this is the spider subclassed from BaseSpider. This basespider is giving me nightmare
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//strong[@class="genmed"]')
items = []
for site in sites[:5]:
item = PanduItem()
item['username'] = site.select('dl/dd/h2/a').select("string()").extract()
item['number_posts'] = site.select('dl/dd/h2/em').select("string()").extract()
item['profile_link'] = site.select('a/@href').extract()
request = Request("http://www.example/profile.php?mode=viewprofile&u=5",
callback = self.parseUserProfile)
request.meta['item'] = item
return request
def parseUserProfile(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@id="current')
myurl = sites[0].select('img/@src').extract()
item = response.meta['item']
image_absolute_url = urljoin(response.url, myurl[0].strip())
item['image_urls'] = [image_absolute_url]
return item
This is the error i am getting. I am not able to find. Looks like its getting item but i am not sure
ERROR
File "/app_crawler/crawler/pipelines.py", line 9, in get_media_requests
for image_url in item['image_urls']:
exceptions.TypeError: 'NoneType' object has no attribute '__getitem__'
Upvotes: 2
Views: 2189
Reputation: 23
You are missing a method in your pipelines.py The said file contains 3 methods:
The item_completed method is the one that handles the saving of the images to a specified path. This path is set in the settings.py as below:
ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
IMAGES_STORE = '/your/path/here'
Also included in the settings.py as seen above is the line that enables the imagepipeline.
I've tried to explain it in the best way I understood it as possible. For further reference, have a look at the official scrapy documentation.
Upvotes: 2
Reputation: 1
And set the IMAGES_STORE setting to a valid directory that will be used for storing the downloaded images. Otherwise the pipeline will remain disabled, even if you include it in the ITEM_PIPELINES setting.
For example:
IMAGES_STORE = '/path/to/valid/dir'
Upvotes: 0
Reputation: 7889
Hmmm. At no point are you appending item
to items
(although the example code in the documentation doesn't do an append either, so I could be barking up the wrong tree).
Try adding it to parse(self, response)
like so and see if this resolves the issue:
for site in sites:
item = PanduItem()
item['username'] = site.select('dl/dd/h2/a').select("string()").extract()
item['number_posts'] = site.select('dl/dd/h2/em').select("string()").extract()
item['profile_link'] = site.select('a/@href').extract()
items.append(item)
Upvotes: 0