Vladimir Tsyupko
Vladimir Tsyupko

Reputation: 163

Requests stops when multiple requests sent in a row

I'm parsing online shops using scrapy and python-requests, and after i get all the info i'm making one more request to get qty by python-requests, and after several minutes spider stops working I dont know what is causing the trouble. Any suggestions?

Scrapy Log:

2014-05-08 15:27:57+0300 [scrapy] DEBUG: Start adding sku1270594 to a cart.
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.sds.com.au
DEBUG:requests.packages.urllib3.connectionpool:"GET /product/trefoil-tee-by-adidas-in-black-camo-grey HTTP/1.1" 200 20223
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.sds.com.au
DEBUG:requests.packages.urllib3.connectionpool:"POST /common/ajaxResponse.jsp;jsessionid=34E95C7662D0F5084FF971CC5693E6E8.store-node1?_DARGS=/browse/product.jsp.addToCartForm HTTP/1.1" 200 146
2014-05-08 15:27:59+0300 [scrapy] DEBUG: End adding sku1270594 to a cart.
2014-05-08 15:27:59+0300 [scrapy] DEBUG: Success. quantity of sku1270594 is 16.
2014-05-08 15:28:00+0300 [sds] DEBUG: Updating  product info sku1270594
2014-05-08 15:28:00+0300 [sds] DEBUG: Added new price sku1270594
2014-05-08 15:28:00+0300 [sds] DEBUG: Scraped from <200 http://www.sds.com.au/product/trefoil-tee-by-adidas-in-black-camo-grey>
2014-05-08 15:28:00+0300 [sds] DEBUG: Updating  product info sku901159
2014-05-08 15:28:00+0300 [sds] DEBUG: Added new price sku901159
2014-05-08 15:28:00+0300 [sds] DEBUG: Scraped from <200 http://www.sds.com.au/product/two-palm-tee-by-folke-in-chalk>
2014-05-08 15:28:00+0300 [sds] DEBUG: Updating  product info sku901163
2014-05-08 15:28:00+0300 [sds] DEBUG: Added new price sku901163
2014-05-08 15:28:00+0300 [sds] DEBUG: Scraped from <200 http://www.sds.com.au/product/two-palm-tee-by-folke-in-chalk>
2014-05-08 15:28:00+0300 [scrapy] DEBUG: Start adding sku1270591 to a cart.
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.sds.com.au
DEBUG:requests.packages.urllib3.connectionpool:"GET /product/trefoil-tee-by-adidas-in-black-camo-grey HTTP/1.1" 200 20225
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.sds.com.au

And that's it.Nothing happens in console anymore. Here's the function that gets the quantity:

def get_qty(self, item):
    r = requests.get(item['url'])
    cookie_cart_user = dict(r.cookies)
    sel = Selector(text=r.text, type="html")
    session = sel.xpath('//input[@name="_dynSessConf"]/@value').extract()[0]
    # print session
    # print cookie_cart_user
    add_to_cart_url = 'http://www.sds.com.au/common/ajaxResponse.jsp;jsessionid=%s?_DARGS=/browse/product.jsp.addToCartForm' % cookie_cart_user['JSESSIONID']
    # ok, so we're adding one item
    log.msg("Adding %s to a cart." % item['internal_id'], log.DEBUG)
    headers = {
        'User-Agent': USER_AGENT,
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Connection': 'close',
    }
    s = requests.session()
    s.keep_alive = False
    r = requests.post(add_to_cart_url,
                      data=self.generate_form_data(item, 10000, session),
                      cookies=cookie_cart_user,
                      headers=headers,
                      timeout=10)
    response = r.json()
    r.close()
    try:
        quantity = int(re.findall(u'\d+', response['formErrors'][0]['errorMessage'])[0])
        log.msg("Success. quantity of %s is %s." % (item['internal_id'], quantity), log.DEBUG)
        return quantity
    except Exception, e:
        log.msg('Error getting data-cart-item on product %s. Error: %s' % (item['internal_id'], str(e)), log.ERROR)
        with open("log/%s.html" % item['internal_id'], "w") as myfile:
            myfile.write('%s' % r.text.encode('utf-8'))

Upvotes: 2

Views: 2162

Answers (1)

Vladimir Tsyupko
Vladimir Tsyupko

Reputation: 163

Well, Jan Vlcinsky recommended to go deep into logging of requests, and after some digging i've decided to re-organize my code a little bit, which gave me the right answer, and now everything works great.

def get_qty(self, item):
    log.msg("Start adding %s to a cart." % item['internal_id'], log.DEBUG)
    logging.basicConfig(level=logging.DEBUG)
    sess = requests.Session()
    sess.keep_alive = False
    adapter = HTTPAdapter(max_retries=50)
    sess.mount('http://', adapter)
    r = sess.get(item['url'])
    cookie_cart_user = dict(r.cookies)
    sel = Selector(text=r.text, type="html")
    session = sel.xpath('//input[@name="_dynSessConf"]/@value').extract()[0]
    add_to_cart_url = 'http://www.sds.com.au/common/ajaxResponse.jsp;jsessionid=%s?_DARGS=/browse/product.jsp.addToCartForm' % cookie_cart_user['JSESSIONID']
    headers = {
        'User-Agent': USER_AGENT,
        'Accept': 'application/json, text/javascript, */*; q=0.01',
    }
    r = sess.post(add_to_cart_url,
                      data=self.generate_form_data(item, 10000, session),
                      cookies=cookie_cart_user,
                      headers=headers,
                      )
    log.msg("End adding %s to a cart." % item['internal_id'], log.DEBUG)
    try:
        response = r.json()
        r.close()
        quantity = int(re.findall(u'\d+', response['formErrors'][0]['errorMessage'])[0])
        log.msg("Success. quantity of %s is %s." % (item['internal_id'], quantity), log.DEBUG)
        return quantity
    except Exception, e:
        log.msg('Error getting data-cart-item on product %s. Error: %s' % (item['internal_id'], str(e)), log.ERROR)
        with open("log/%s.html" % item['internal_id'], "w") as myfile:
            myfile.write('%s' % r.text.encode('utf-8'))

And now if error occurs log says

2014-05-08 16:00:10+0300 [scrapy] DEBUG: Start adding sku1210352 to a cart.
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.sds.com.au
WARNING:requests.packages.urllib3.connectionpool:Retrying (50 attempts remain) after connection broken by 'error(60, 'Operation timed out')': /product/startlet-gilet-fleece-jacket-by-zoo-york-in-black
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (2): www.sds.com.au
DEBUG:requests.packages.urllib3.connectionpool:"GET /product/startlet-gilet-fleece-jacket-by-zoo-york-in-black HTTP/1.1" 200 20278
DEBUG:requests.packages.urllib3.connectionpool:"POST /common/ajaxResponse.jsp;jsessionid=EEA02CE768B288DD302896F6A8C4780F.store-node2?_DARGS=/browse/product.jsp.addToCartForm HTTP/1.1" 200 145
2014-05-08 16:01:14+0300 [scrapy] DEBUG: End adding sku1210352 to a cart.

And after that it retying, and continue like nothing happend

Upvotes: 2

Related Questions