Reputation: 903
I'm building a web scraper and I have a method I am running concurrently as follows:
def parallel_scrape():
p= Pool()
results = p.amap(self.fetch, domain_list]
while not results.ready():
time.sleep(5)
It works as expected besides from the fact that it consumes too much memory. Actually it seems as though it will consume all available memory. I tried calling the garbage collector manually but it has no effect what so ever. I then modified the code as follows:
def parallel_scrape():
p= Pool()
results = p.amap(self.fetch, domain_list]
while not results.ready():
time.sleep(5)
p.terminate()
p.restart()
This stops the program entirely after returning the first 6 domains. Can anyone help?
Upvotes: 2
Views: 120