E235
E235

Reputation: 13490

When using bruteforce with multithreading it slower than a single thread

I want to list all the combinations of a base36 string from 0000 to zzzz.
When I run it with a single thread it works faster (~6-5 seconds) than with multithreads (~13-14 seconds).
I read here why using more threads can be slower than using less threads.
But I have 4 cores (8 logical processors) and I don't think this is the issue in my case.

Am I doing something wrong with my code ?
Maybe the join() function slows things down ?

Here is the my code:

import time
import threading

# https://codegolf.stackexchange.com/questions/169432/increment-base-36-strings?page=1&tab=votes#tab-top
def inc_base36(s):
   L,R=s[:-1],s[-1:];
   return s and[[L+chr(ord(R)+1),inc_base36(L)+'0'][R>'y'],L+'a'][R=='9']

def bruteforce(start_id, end_id):
 while start_id != end_id:
   start_id = inc_base36(start_id)

# Single thread
# --- 5.15600013733 seconds ---
start_time = time.time()
bruteforce('0000', 'zzzz')
print("--- %s seconds ---" % (time.time() - start_time))

# Two threads
# --- 13.603000164 seconds ---
t1 = threading.Thread(target=bruteforce, args = ('0000', 'hzzz')) # in decimal (0, 839807)
t2 = threading.Thread(target=bruteforce, args = ('i000', 'zzzz')) # in decimal (839808, 1679615)

start_time = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
print("--- %s seconds ---" % (time.time() - start_time))

# Three threads
# --- 14.3159999847 seconds ---
t1 = threading.Thread(target=bruteforce, args = ('0000', 'bzzz')) # in decimal (0, 559871)
t2 = threading.Thread(target=bruteforce, args = ('c000', 'nzzz')) # in decimal (559872, 1119743)
t3 = threading.Thread(target=bruteforce, args = ('o000', 'zzzz')) # in decimal (1119744, 1679615)

start_time = time.time()
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
print("--- %s seconds ---" % (time.time() - start_time))

Upvotes: 1

Views: 251

Answers (2)

Roland Smith
Roland Smith

Reputation: 43533

While Yossi's answer is correct, a faster solution could be to use the tools from the standard library.

In this case, itertools.product. If I've interpreted your question correctly, you could do:

In [1]: from itertools import product

In [2]: base36 = "0123456789abcdefghijklmnopqrstuvwxyz"

In [3]: res = [''.join(p) for p in product(base36, repeat=4)]

In [4]: res[0], res[-1]
Out[4]: ('0000', 'zzzz')

Let's see how fast this is:

In [5]: %timeit res = [''.join(p) for p in product(base36, repeat=4)]
800 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, this is much faster. The timing was done using CPython 3.6 on an old core2 CPU. The itertools module in CPython is written in C, which is probably a significant part of the reason why it is faster.

And the result seems to be complete:

In [6]: len(res)
Out[6]: 1679616

In [7]: 36**4
Out[7]: 1679616

Upvotes: 0

Yossi
Yossi

Reputation: 12110

Most Python implementations has a GIL (Global Intperpreter Lock) which allows execution of a one thread at a time. You should use Jython which doesn't have GIL or implement multiprocessing in your script.

Upvotes: 1

Related Questions