Pathos.multiprocessing + memory leakage + amap

Question

I'm building a web scraper and I have a method I am running concurrently as follows:

def parallel_scrape(): 
    p= Pool()
    results = p.amap(self.fetch, domain_list]
    while not results.ready():
           time.sleep(5)

It works as expected besides from the fact that it consumes too much memory. Actually it seems as though it will consume all available memory. I tried calling the garbage collector manually but it has no effect what so ever. I then modified the code as follows:

def parallel_scrape(): 
        p= Pool()
        results = p.amap(self.fetch, domain_list]
        while not results.ready():
               time.sleep(5)
               p.terminate()
               p.restart()

This stops the program entirely after returning the first 6 domains. Can anyone help?

Pathos.multiprocessing + memory leakage + amap

Answers (0)

Related Questions