Reputation: 387
I have a python script doing data processing. On laptop, it probably takes 30 days to finish. A single function is being executed through a for-loop for hundreds of times. Each time a new argument will be feed into the single function.
I am thinking to design some parallel/distributed computing manner to speed up the script: divide the for-loop into multiple docker containers, and each container is responsible for a subset of the for-loop but with different arguments.
Here is some pseudo code:
def single_fun(myargs):
#do something here
data = file_data(myargs)
post_process(data)
def main_fun():
for i in range(100):
single_fun(i)
my idea:
My question: Is my idea doable? any useful feedback here? Thanks. How do I implement this idea? Any framework or tool I can leverage to do finish this idea?
Upvotes: 1
Views: 1555