Javier Kim
Javier Kim

Reputation: 13

multiprocessing Pool slower with method than function

I tried to get image binary from url by using requests or urllib

Code looks like this:

class Myclass:
  def __init__(self, image_list)
    self.__image_list = image_list
   
  def run(self, image_list):
    pool = Pool(30)
    for _ in pool.imap_unordered(self.get_from_url, self.__image_list):
      pass

  def get_from_url(self, image_list):
    #codes of HTTP request(either using requests or urllib)

It turns out to be too slow with this code.

But when I move in-class method(get_from_url) to outside of class it works fine:

def get_from_url(image_list):
  #codes of HTTP request(either using requests or urllib)

class Myclass:
  def __init__(self, image_list)
    self.__image_list = image_list
   
  def run(self, image_list):
    pool = Pool(30)
    for _ in pool.imap_unordered(get_from_url, self.__image_list):
      pass

What is difference between these approaches and why is one slower than the other?

Upvotes: 1

Views: 164

Answers (1)

D Hudson
D Hudson

Reputation: 1092

In the first case, get_from_url is a non-static method on Myclass. For this to be called asynchronously, the entire class (with all its attributes, like the whole of self.__image_list) needs to be pickled and sent to the new process and then unpickled in the new process.

In the second case, the function is not tied to a class so the only item that is transferred to the Pool's process is the argument(s) for get_from_url.

You could define get_from_url as a @staticmethod to keep it within the class but prevent the whole of self being pickled:

class Myclass:
  def __init__(self, image_list):
    self.__image_list = image_list
   
  def run(self, image_list):
    pool = Pool(30)
    for _ in pool.imap_unordered(self.get_from_url, self.__image_list):
      pass

  @staticmethod
  def get_from_url(image_list):
    #codes of HTTP request(either using requests or urllib)

Upvotes: 2

Related Questions