Reputation: 10470
I wanted to speed up a python script I have that iterates over 300 records. So I figured I'd try to use threading. My non-thread version takes just under 1 minute to execute. My threaded version does 1 seconds better. Here are the pertinent parts of my thread version of the script:
... other imports ...
import threading
import concurrent.futures
# global vars
threads = []
check_records = []
default_max_problems = 5
problems_found = 0
lock = threading.Lock()
... some functions ...
def check_host(rec):
with lock:
global problems_found
global max_problems
if problems_found >= max_problems:
# I'd prefer to stop all threads and stop new ones from starting,
# but I don't know how to do that.
return
... bunch of function calls that do network stuff ...
check_records.append(rec)
if not(reachable and dns_ready):
problems_found += 1
logging.debug(f"check_host problems_found is {problems_found}.")
if __name__ == '__main__':
... handle command line args ...
try:
with concurrent.futures.ThreadPoolExecutor() as executor:
for ip in get_ips():
req_rec = find_dns_req_record(ip, dns_record_reqs)
executor.submit(check_host, req_rec)
Why is performance of my threaded script almost the same my non-thread version?
Upvotes: 2
Views: 878
Reputation: 1172
The kind of work you are performing is important to answer the question. If you are performing many IO-bound tasks (network calls, disk reads, etc.), then using Python's multi-threading should provide a good speed increase, since you can now have multiple threads waiting for multiple IO calls.
However, if you are performing raw computation, then multi-threading wont help you, because of Python's GIL (global interpreter lock), which basically only allows one thread to run at a time. To speed up non IO-bound computation, you will need to use the multiprocessing
module, and spin up multiple Python processes. One of the disadvantages of multiple processes vs multiple threads is that it is harder to share data/memory between processes (because they have separate address spaces) vs threads (threads share memory because they are part of the same process).
Another thing that is important to consider is how you are using locks. If you put too much code under a lock, then threads won't be able to concurrently execute that code. You should try to have the smallest amount of code possible under any given lock, and only in places where shared data is accessed. If your entire thread function body is under a lock then you eliminate the potential for speed improvement via multi-threading.
Upvotes: 2