Abbie
Abbie

Reputation: 279

How do I use multiprocessing/multithreading to make my Python script quicker?

I am fairly new to Python and programming in general. I have written a script to go through a long list (~7000) of URLs and check their status to find any broken links. Predictably, this takes a few hours to request each URL one by one. I have heard that multiprocessing (or multithreading?) can be used to speed things up. What is the best approach to this? How many processes/threads should I run in one go? Do I have to create batches of URLs to check concurrently?

Upvotes: 0

Views: 136

Answers (1)

user4815162342
user4815162342

Reputation: 155406

The answer to the question depends on whether the process spends most of its time processing data or waiting for the network. If it is the former, then you need to use multiprocessing, and spawn about as many processes as you have physical cores on the system. Do not forget to make sure that you choose the appropriate algorithm for the task. Finally, if all else fails, coding parts of the program in C can be a viable solution as well.

If your program is slow because it spends a lot of time waiting for individual server responses, you can parallelize network access using threads or an asynchronous IO framework. In this case you can use many more threads than you have physical processor cores because most of the time your cores will be sleeping waiting for something interesting to happen. You will need to measure the results on your machine to find out the best number of threads that works for you.

Whatever you do, please make sure that your program is not hammering the remote servers with a large number of concurrent or repeated requests.

Upvotes: 3

Related Questions