Reputation: 10789
I need to generate masks for dropout for a specific neural network. I am looking at the fastest way possible to achieve this using numpy (CPU only).
I have tried:
def gen_mask_1(size, p=0.75):
return np.random.binomial(1, p, size)
def gen_mask_2(size, p=0.75):
mask = np.random.rand(size)
mask[mask>p]=0
mask[mask!=0]=1
return mask
where p
is the probability of having 1
The speed of these two approaches is comparable.
%timeit gen_mask_1(size=2048)
%timeit gen_mask_2(size=2048)
45.9 µs ± 575 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
47.4 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Are there faster methods?
UPDATE
Following the suggestions got so far, I have tested a few extra implementations. I couldn't get @njit
to work when setting parallel=True
(TypingError: Failed in nopython mode pipeline (step: convert to parfors)
), it works without but, I think, less efficiently.
I have found a python binding for Intel's mlk_random
(thank you @MatthieuBrucher for the tip!) here: https://github.com/IntelPython/mkl_random
So far, using mlk_random together with @nxpnsv's approach gives the best result.
@njit
def gen_mask_3(size, p=0.75):
mask = np.random.rand(size)
mask[mask>p]=0
mask[mask!=0]=1
return mask
def gen_mask_4(size, p=0.75):
return (np.random.rand(size) < p).astype(int)
def gen_mask_5(size):
return np.random.choice([0, 1, 1, 1], size=size)
def gen_mask_6(size, p=0.75):
return (mkl_random.rand(size) < p).astype(int)
def gen_mask_7(size):
return mkl_random.choice([0, 1, 1, 1], size=size)
%timeit gen_mask_4(size=2048)
%timeit gen_mask_5(size=2048)
%timeit gen_mask_6(size=2048)
%timeit gen_mask_7(size=2048)
22.2 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
25.8 µs ± 336 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.64 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29.6 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 1
Views: 335
Reputation: 173
As I said in the comment the question the implementation
def gen_mask_2(size, p=0.75):
mask = np.random.rand(size)
mask[mask>p]=0
mask[mask!=0]=1
return mask
can be improved, by using that comparison gives an bool
which then can be converted to int
. This removes the two comparisons with masked assignments you otherwise have, and it makes for a pretty one liner :)
def gen_mask_2(size, p=0.75):
return = (np.random.rand(size) < p).astype(int)
Upvotes: 1
Reputation: 114831
Another option is numpy.random.choice
, with an input of 0s and 1s where the proportion of 1s is p
. For example, for p
= 0.75, use np.random.choice([0, 1, 1, 1], size=n)
:
In [303]: np.random.choice([0, 1, 1, 1], size=16)
Out[303]: array([1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0])
This is faster than using np.random.binomial
:
In [304]: %timeit np.random.choice([0, 1, 1, 1], size=10000)
71.8 µs ± 368 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [305]: %timeit np.random.binomial(1, 0.75, 10000)
174 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
To handle an arbitrary value for p
, you can use the p
option of np.random.choice
, but then the code is slower than np.random.binomial
:
In [308]: np.random.choice([0, 1], p=[0.25, 0.75], size=16)
Out[308]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0])
In [309]: %timeit np.random.choice([0, 1], p=[0.25, 0.75], size=10000)
227 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 2
Reputation: 39052
You can make use of Numba compiler and make things faster by applying njit
decorator on your functions. Below is an example for a very large size
from numba import njit
def gen_mask_1(size, p=0.75):
return np.random.binomial(1, p, size)
@njit(parallel=True)
def gen_mask_2(size, p=0.75):
mask = np.random.rand(size)
mask[mask>p]=0
mask[mask!=0]=1
return mask
%timeit gen_mask_1(size=100000)
%timeit gen_mask_2(size=100000)
2.33 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
512 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 2