RedVelvet
RedVelvet

Reputation: 1903

Scrapy have a item limit?

in those days i'm making a Spider with Scrapy in Python. It's basically a simple spider class, that make simple parsing of some field in a Html page. I don't use the starts_url[] Scrapy field, but i use a personalized list like this:

class start_urls_mod():
def __init__(self, url, data):
    self.url=url
    self.data=data

#Defined in the class:
url_to_scrape = []
#Populated in the body in this way
self.url_to_scrape.append(start_urls_mod(url_found), str(data_found))

passing the url in this way

for any_url in self.url_to_scrape:
    yield scrapy.Request(any_url.url, callback=self.parse_page)

It works good with a limited numbers of url like 3000.

But if i try to make a test and it found about 32532 url to scrape. In the JSON output file i found only about 3000 url scraped.

My function recall it self:

yield scrapy.Request(any_url.url, callback=self.parse_page)

So the question is, there is some memory limit for the Scrapy items?

Upvotes: 0

Views: 4547

Answers (1)

eLRuLL
eLRuLL

Reputation: 18799

No, if you haven't specified CLOSESPIDER_ITEMCOUNT on your settings.

maybe scrapy is finding duplicates in your requests, please check if that stats contain something like dupefilter/filtered on your logs.

Upvotes: 2

Related Questions