Reputation: 1007
I have a very simple code, shown below. Scraping is okay, I can see all print
statements generating correct data. In Pipeline
,initialization is working fine. However, process_item
function is not getting called, as print
statement at the start of the function is never executed.
Spider: comosham.py
import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from activityadvisor.items import ComoShamLocation
from activityadvisor.items import ComoShamActivity
from activityadvisor.items import ComoShamRates
import re
class ComoSham(Spider):
name = "comosham"
allowed_domains = ["www.comoshambhala.com"]
start_urls = [
"http://www.comoshambhala.com/singapore/classes/schedules",
"http://www.comoshambhala.com/singapore/about/location-contact",
"http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
"http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
]
def parse(self, response):
category = (response.url)[39:44]
print 'in parse'
if category == 'class':
pass
"""self.gen_req_class(response)"""
elif category == 'about':
print 'about to call parse_location'
self.parse_location(response)
elif category == 'rates':
pass
"""self.parse_rates(response)"""
else:
print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'
def parse_location(self, response):
print 'in parse_location'
item = ComoShamLocation()
item['category'] = 'location'
loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
item['pin'] = (loc[5])[11:18]
item['phone'] = (loc[9])[6:20]
item['fax'] = (loc[10])[6:20]
item['email'] = loc[12]
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
return item
Items file:
import scrapy
from scrapy.item import Item, Field
class ComoShamLocation(Item):
address = Field()
pin = Field()
phone = Field()
fax = Field()
email = Field()
category = Field()
Pipeline file:
class ComoShamPipeline(object):
def __init__(self):
self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb'))
self.locationdump.writerow(['Address','Pin','Phone','Fax','Email'])
def process_item(self,item,spider):
print 'processing item now'
if item['category'] == 'location':
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])
else:
pass
Upvotes: 13
Views: 8683
Reputation: 1489
This solved my problem: I was Dropping all items before my Pipeline get called, so process_item() wasn't getting called but open_spider and close_spider was being called. So tmy solution was just change the order to use this Pipeline before the other Pipeline that Drops items.
Scrapy Pipeline Documentation.
Just remember that Scrapy calls Pipeline.process_item() only if there is item to process!
Upvotes: 1
Reputation: 1105
Adding to the answers above,
1. Remember to add the following line to settings.py!
ITEM_PIPELINES = {'[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]': 300}
2. Yield the item when your spider runs!
yield my_item
Upvotes: 0
Reputation: 867
Use ITEM_PIPELINES
in settings.py:
ITEM_PIPELINES = ['project_name.pipelines.pipeline_class']
Upvotes: 5
Reputation: 2187
Your problem is that you are never actually yielding the item. parse_location returns an item to parse, but parse never yields that item.
The solution would be to replace:
self.parse_location(response)
with
yield self.parse_location(response)
More specifically, process_item never gets called if no items are yielded.
Upvotes: 15