Reputation: 337
I have some trouble using threading and scipy.stats.randint module. Indeed, when several threads are launched, a local array (bootIndexs in the code below) seems to be used for all launched thread.
This is the raised Error
> Exception in thread Thread-559:
Traceback (most recent call last):
...
File "..\calculDomaine3.py", line 223, in bootThread
result = bootstrap(nbB, distMod)
File "...\calculDomaine3.py", line 207, in bootstrap
bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 5014, in rvs
return super(rv_discrete, self).rvs(*args, **kwargs)
File "C:\Python27\lib\site-packages\scipy\stats\distributions.py", line 582, in rvs
vals = reshape(vals, size)
File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape
return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged
And this is my code :
import threading
import Queue
from scipy import stats as spstats
nbThreads = 4
def test(nbBoots, nbTirages, modules ):
def bootstrap(nbBootsThread, distribModules) :
distribMax = []
for j in range(nbBootsThread):
bootIndexs = spstats.randint.rvs(0, nbTirages-1, size = nbTirages)
boot = [distribModules[i] for i in bootIndexs]
distribMax.append(max(boot))
return distribMax
q = Queue.Queue()
def bootThread (nbB, distMod):
result = bootstrap(nbB, distMod )
q.put(result, False)
q.task_done()
works = []
for i in range(nbThreads) :
works.append(threading.Thread(target = bootThread, args = (nbBoots//nbThreads, modules[:],) ))
for w in works:
w.daemon = True
w.start()
q.join()
distMaxResult = []
for j in range(q.qsize()):
distMaxResult += q.get()
return distMaxResult
class classTest:
def __init__(self):
self.launch()
def launch(self):
print test(100, 1000, range(1000) )
Thanks for your answers.
Upvotes: 2
Views: 451
Reputation: 22897
I have no experience with threading, so this might be completely off the mark.
scipy.stats.randint, as the other distributions in scipy.stats, is an instance of the corresponding distribution class. This means that every thread is accessing the same instance. During the rvs call an attribute _size
is set. If a different thread with a different size accesses the instance in the meantime, then you would get the ValueError that the sizes don't match in the reshape. This sounds like the race condition to me.
I would recommend to use numpy.random directly in this case (this is the call in scipy.stats.randint)
numpy.random.randint(min, max, self._size)
maybe you have better luck there.
If you need a distribution that is not available in numpy.random, then you would need to instantiate new instances of the distribution in each thread, if my guess is correct.
Upvotes: 1
Reputation: 43031
Indeed, when several threads are launched, a local array (bootIndexs in the code below) seems to be used for all launched thread.
That's the entire point of threads: lightweight tasks that share everything with their spawning process! :) If you are looking for a share-nothing solution, than you should perhaps look at the multiprocessing module (keep in mind spawing a process is much heavier on the system than spawning a thread, though).
However, back to your problem... mine is little more than a shot in the dark, but you could try to change this line:
boot = [distribModules[i] for i in bootIndexs]
to:
boot = [distribModules[i] for i in bootIndexs.copy()]
(using a copy of the array rather than the array itself). This seems unlikely to be the issue (you are just iterating over the array, not actually using it), but is the only point I can see when you use it in your thread so...
This of course works if your array content is not to be changed by the threads manipulating it. If changing the value of the "global" array is the correct behaviour, then you should contrarily implement a Lock()
to forbid simultaneous access to that resource. Your threads should then do something like:
lock.acquire()
# Manipulate the array content here
lock.release()
Upvotes: 2