Reputation: 13
I tried to get image binary from url by using requests or urllib
Code looks like this:
class Myclass:
def __init__(self, image_list)
self.__image_list = image_list
def run(self, image_list):
pool = Pool(30)
for _ in pool.imap_unordered(self.get_from_url, self.__image_list):
pass
def get_from_url(self, image_list):
#codes of HTTP request(either using requests or urllib)
It turns out to be too slow with this code.
But when I move in-class method(get_from_url) to outside of class it works fine:
def get_from_url(image_list):
#codes of HTTP request(either using requests or urllib)
class Myclass:
def __init__(self, image_list)
self.__image_list = image_list
def run(self, image_list):
pool = Pool(30)
for _ in pool.imap_unordered(get_from_url, self.__image_list):
pass
What is difference between these approaches and why is one slower than the other?
Upvotes: 1
Views: 164
Reputation: 1092
In the first case, get_from_url
is a non-static method on Myclass
. For this to be called asynchronously, the entire class (with all its attributes, like the whole of self.__image_list
) needs to be pickled and sent to the new process and then unpickled in the new process.
In the second case, the function is not tied to a class so the only item that is transferred to the Pool's process is the argument(s) for get_from_url
.
You could define get_from_url
as a @staticmethod
to keep it within the class but prevent the whole of self
being pickled:
class Myclass:
def __init__(self, image_list):
self.__image_list = image_list
def run(self, image_list):
pool = Pool(30)
for _ in pool.imap_unordered(self.get_from_url, self.__image_list):
pass
@staticmethod
def get_from_url(image_list):
#codes of HTTP request(either using requests or urllib)
Upvotes: 2