wiretext
wiretext

Reputation: 3352

how to insert data from scrapy to mysql

i am trying to get data from amazon using scrapy i can get data in CSV but i am not able to insert data in mysql database please find my code my spider is

import scrapy
from craigslist_sample.items import AmazonDepartmentItem
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor

class AmazonAllDepartmentSpider(scrapy.Spider):

    name = "amazon"
    allowed_domains = ["amazon.com"]
    start_urls = [
        "http://www.amazon.com/gp/site-directory/ref=nav_sad/187-3757581-3331414"
    ]
    def parse(self, response):
        for sel in response.xpath('//ul/li'):
            item = AmazonDepartmentItem()
            item['title'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()
            item['desc'] = sel.xpath('text()').extract()
        return item

my pipeline code is

import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):


    host = 'rerhr.com'
    user = 'amazon'
    password = 'sads23'
    db = 'amazon_project'

    def __init__(self):
        self.connection = MySQLdb.connect(self.host, self.user, self.password, self.db)
        self.cursor = self.connection.cursor()

    def process_item(self, item, spider):   
        try:
            self.cursor.execute("""INSERT INTO amazon_project.ProductDepartment (ProductDepartmentLilnk) 
                            VALUES (%s)""",
                           (
                            item['link'].encode('utf-8')))

            self.connection.commit()

        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])
        return item

when i run following command

scrapy crawl amazon -o items.csv -t csv

then i can able to get data in my CSV but when i run

scrapy crawl amazon

with above code i am not able to insert data in mysql enter image description here so please help me what we have to do then i can insert data in mysql

thanks

Upvotes: 1

Views: 6059

Answers (1)

alecxe
alecxe

Reputation: 474271

The problem is actually inside the parse() callback. extract() calls return a list and, as a result, all item field values become lists. Then, item['link'].encode('utf-8') call in the pipeline fails because there is no encode() method on a list.

A quick and simple fix would be to get the first elements of the extract() call results:

def parse(self, response):
    for sel in response.xpath('//ul/li'):
        item = AmazonDepartmentItem()
        item['title'] = sel.xpath('a/text()').extract()[0]
        item['link'] = sel.xpath('a/@href').extract()[0]
        item['desc'] = sel.xpath('text()').extract()[0]
        yield item

Note that I've also replaced the return item expression with a yield item and put it inside the loop.


A better approach would be to define an ItemLoader with input and output processors:

from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import TakeFirst

class ProductLoader(ItemLoader):
    default_output_processor = TakeFirst()

FYI, here is what TakeFirst() does:

Returns the first non-null/non-empty value from the values received, so it’s typically used as an output processor to single-valued fields. It doesn’t receive any constructor arguments, nor accept Loader contexts.

Then, the parse() method would transform into:

def parse(self, response):
    for sel in response.xpath('//ul/li'):
        l = ItemLoader(item=AmazonDepartmentItem(), selector=sel)
        l.add_xpath('title', 'a/text()')
        l.add_xpath('link', 'a/@href')
        l.add_xpath('desc', 'text()')
        yield l.load_item()

Upvotes: 7

Related Questions