Reputation: 143
I want to get the total number of the scraped item and the date when script run and insert it inside Mysql, I put the code inside Pipeline and it seems that the insertion of data is in the loop, where can I properly put those data I wanted it to put when scraping is done.
Any idea please? here's my code
import mysql.connector
class GpdealsSpiderPipeline_hm(object):
#some working code here
def store_db(self, item):
self.curr.execute("""insert into status_hm (script_lastrun, new_sale_item, no_item_added, total_item) values (%s, %s, %s, %s)""", (
'sample output',
'sample output',
'sample output',
'sample output',
))
self.conn.commit()
Error: mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '' for key 'PRIMARY'
So probably I am puting my code on wrong place. Please help thank you
Upvotes: 0
Views: 63
Reputation: 10210
Scrapy pipeline's purpose is to process single item at a time. However, you can achieve what you want by putting the logic in the close_spider
method. You can get the total number of items scraped from Scrapy stats under the key item_scraped_count
. See the example:
class ExamplePipeline(object):
def close_spider(self, spider):
stats = spider.crawler.stats.get_stats()
print('Total number of scraped items:', stats['item_scraped_count'])
def process_item(self, item, spider):
# logic to process the item
return item
To provide complete info, you can achieve your goal also by connecting to the signal spider_closed
from pipeline, extension or from the spider itself. See this complete example connecting to the signal from the spider:
import scrapy
from scrapy import signals
class QuotesSpider(scrapy.Spider):
name = 'quotes'
start_urls = ['http://quotes.toscrape.com/']
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(QuotesSpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
return spider
def spider_closed(self, spider):
stats = spider.crawler.stats.get_stats()
print('Total number of scraped items:', stats['item_scraped_count'])
def parse(self, response):
for quote in response.xpath('//div[@class="quote"]'):
item = {
'text': quote.xpath('./*[@itemprop="text"]/text()').extract_first()
}
yield item
Upvotes: 1