bjj
bjj

Reputation: 51

Weird Scrapy meta attribute issue

I'm trying to scrape this category of this site: n11.com/mutfak-gerecleri

There are individual products that have multiple sizes, and prices like this one: http://urun.n11.com/tava/tefal-titanium-hard-tava-no24-26-28-P99340786

The different price is loaded by this particular ajax call: http://urun.n11.com/sku/stok/99340786/123667829161?newDesign=true&isFashion=false where the first number after stok is the product id and the second one is the skuId. The pid can be found in the url and the skuIds can be found in the html(search for skuList). The call is is triggered by changing the size of the pan in this case.

This the code that deals with the ajax call and yielding to the new method:

def parse(self, response):
    item = N11ProductItem()
    item['product_url'] = response.url    
    print 'sizes=',sizes
    print 'skuIds=', skuIds
    print
    for sku, size in zip(skuIds, sizes):          
        print 'Sku before overwriting', item.get('sku')
        print 'size before overwriting',item.get('size')
        item['sku'] = sku
        item['size'] = size
        print 'Sku after overwriting', item['sku']
        print 'Size after overwriting',item['size']
        r = Request(
           url=ajax_call.format(pid=pid, sku=sku),
           meta={
           'item': item, 'dont_merge_cookies': True,
               },
           headers={
               'X-Requested-With': 'XMLHttpRequest'
               },
           callback=self.parse_price)
        print "yielding item"
        yield r

And the parse_price method:

    def parse_price(self, response):

        item = response.meta['item']
        print "Item in parse_price method: \n", item
        real_price = response.xpath("//div[@id='oldPrice']/text()").extract()
        item['real_price'] = real_price[0] if real_price else ''

        display_price = response.xpath("//div[@id='displayPrice']/text()").extract()
        item['discounted_price'] = display_price[0] if display_price else ''
        yield item

Should mention that I tried the code with and without the dont_merge_cookies meta param as well as don't filter request parameter.

I am not creating a new item for each item, because i want the same item 3 times but with little differences(price, size, color, etc).

This is my output:

sizes= [u'28', u'24', u'26']
skuIds= [u'123667829161', u'123667829162', u'123667829163']
Sku before overwriting None
size before overwriting None

Sku after overwriting 123667829161

Size after overwriting 28

Yielding item

Sku before overwriting 123667829161

size before overwriting 28

Sku after overwriting 123667829162

Size after overwriting 24

Yielding item

Sku before overwriting 123667829162

size before overwriting 24

Sku after overwriting 123667829163

Size after overwriting 26

Yielding item

Item in parse_price {'product_url': 'http://urun.n11.com/tava/tefal-
titanium-hard-tava-no24-26-28-P99340786',
'size': u'26',
'sku': u'123667829163'}

Item in parse_price {'discounted_price': u'81,50 TL',
'product_url': 'http://urun.n11.com/tava/tefal-titanium-hard-tava-
 no24-26-28-P99340786',
'real_price': u'81,50 TL',
'size': u'26',
'sku': u'123667829163'}

Item in parse_price {'discounted_price': u'69,50 TL',
'product_url': 'http://urun.n11.com/tava/tefal-titanium-hard-tava-no24-26-28-P99340786',
'real_price': u'69,50 TL',
'size': u'26',
'sku': u'123667829163'}

As you can see from the output, everyhing from the meta item gets overwritten with the data from the last request for some reason.

I tried debugging it myself, reading scrapy docs, and looking over here on SO, but I just cannot find an answer to this. I'm missing something and I need some outside help.

Thanks a lot!

Upvotes: 1

Views: 230

Answers (1)

rfelten
rfelten

Reputation: 181

Your issue is not related to the meta mechanism or Scrapy at all. You should keep in mind that you have created a single with single attributes (size, sku, etc). Your zip() loop just sets / overwrites this attributes before yielding the requests.

If you really want to create (only) a single N11ProductItem() for multiple sizes / product instances, you can't use a single integer attribute to store different sizes / skus. You need a nested structure like a dict or list to archive this.

Upvotes: 1

Related Questions