Reputation: 4938
I am wondering why scipy
s random variable class stats.bernoulli
yields ndarrays with dtype int32
as samples:
> stats.bernoulli.rvs(0.3, size=10)
array([0, 1, 0, 1, 0, 1, 0, 0, 0, 1]
> stats.bernoulli.rvs(0.3, size=10).dtype
dtype('int32')
Using 32-Bit-Integer values for binary results seems to be extremely inefficient. I would have expected a dtype of np.bool
or np.int8
.
Does anyone know the reason for the decision to generate int32
arrays as samples?
Remark: I am working with big (10^8) samples. Strangely, a conversion to int8
does not give me a any better performance, neither with creating the array nor with calculating function over the array. Maybe because my CPU can just handly 32-bit / 64-bit chunks...
Upvotes: 1
Views: 938
Reputation: 908
On my system it's int64
, so yeah, it's just a default integer size. Why not bool
you say. In the source code it uses scipy.hypergeom
which returns integers.
The only way I can think of is to pre-initialize your output arrays with dtype=np.bool
, if you can. Then although you will still waste some memory on generation of stats.bernoulli.rvs
, you can clear it up right after.
Upvotes: 1