Reputation: 51
I'm trying to scrape this category of this site: n11.com/mutfak-gerecleri
There are individual products that have multiple sizes, and prices like this one: http://urun.n11.com/tava/tefal-titanium-hard-tava-no24-26-28-P99340786
The different price is loaded by this particular ajax call: http://urun.n11.com/sku/stok/99340786/123667829161?newDesign=true&isFashion=false where the first number after stok is the product id and the second one is the skuId. The pid can be found in the url and the skuIds can be found in the html(search for skuList). The call is is triggered by changing the size of the pan in this case.
This the code that deals with the ajax call and yielding to the new method:
def parse(self, response):
item = N11ProductItem()
item['product_url'] = response.url
print 'sizes=',sizes
print 'skuIds=', skuIds
print
for sku, size in zip(skuIds, sizes):
print 'Sku before overwriting', item.get('sku')
print 'size before overwriting',item.get('size')
item['sku'] = sku
item['size'] = size
print 'Sku after overwriting', item['sku']
print 'Size after overwriting',item['size']
r = Request(
url=ajax_call.format(pid=pid, sku=sku),
meta={
'item': item, 'dont_merge_cookies': True,
},
headers={
'X-Requested-With': 'XMLHttpRequest'
},
callback=self.parse_price)
print "yielding item"
yield r
And the parse_price method:
def parse_price(self, response):
item = response.meta['item']
print "Item in parse_price method: \n", item
real_price = response.xpath("//div[@id='oldPrice']/text()").extract()
item['real_price'] = real_price[0] if real_price else ''
display_price = response.xpath("//div[@id='displayPrice']/text()").extract()
item['discounted_price'] = display_price[0] if display_price else ''
yield item
Should mention that I tried the code with and without the dont_merge_cookies meta param as well as don't filter request parameter.
I am not creating a new item for each item, because i want the same item 3 times but with little differences(price, size, color, etc).
This is my output:
sizes= [u'28', u'24', u'26']
skuIds= [u'123667829161', u'123667829162', u'123667829163']
Sku before overwriting None
size before overwriting None
Sku after overwriting 123667829161
Size after overwriting 28
Yielding item
Sku before overwriting 123667829161
size before overwriting 28
Sku after overwriting 123667829162
Size after overwriting 24
Yielding item
Sku before overwriting 123667829162
size before overwriting 24
Sku after overwriting 123667829163
Size after overwriting 26
Yielding item
Item in parse_price {'product_url': 'http://urun.n11.com/tava/tefal-
titanium-hard-tava-no24-26-28-P99340786',
'size': u'26',
'sku': u'123667829163'}
Item in parse_price {'discounted_price': u'81,50 TL',
'product_url': 'http://urun.n11.com/tava/tefal-titanium-hard-tava-
no24-26-28-P99340786',
'real_price': u'81,50 TL',
'size': u'26',
'sku': u'123667829163'}
Item in parse_price {'discounted_price': u'69,50 TL',
'product_url': 'http://urun.n11.com/tava/tefal-titanium-hard-tava-no24-26-28-P99340786',
'real_price': u'69,50 TL',
'size': u'26',
'sku': u'123667829163'}
As you can see from the output, everyhing from the meta item gets overwritten with the data from the last request for some reason.
I tried debugging it myself, reading scrapy docs, and looking over here on SO, but I just cannot find an answer to this. I'm missing something and I need some outside help.
Thanks a lot!
Upvotes: 1
Views: 230
Reputation: 181
Your issue is not related to the meta mechanism or Scrapy at all. You should keep in mind that you have created a single with single attributes (size, sku, etc). Your zip()
loop just sets / overwrites this attributes before yield
ing the requests.
If you really want to create (only) a single N11ProductItem()
for multiple sizes / product instances, you can't use a single integer
attribute to store different sizes / skus. You need a nested structure like a dict
or list
to archive this.
Upvotes: 1