phynfo
phynfo

Reputation: 4938

Why does scipys stats.bernoulli.rvs yields an array with dtype int32?

I am wondering why scipys random variable class stats.bernoulli yields ndarrays with dtype int32 as samples:

 > stats.bernoulli.rvs(0.3, size=10)
 array([0, 1, 0, 1, 0, 1, 0, 0, 0, 1]
 > stats.bernoulli.rvs(0.3, size=10).dtype
 dtype('int32')

Using 32-Bit-Integer values for binary results seems to be extremely inefficient. I would have expected a dtype of np.bool or np.int8.

Does anyone know the reason for the decision to generate int32 arrays as samples?

Remark: I am working with big (10^8) samples. Strangely, a conversion to int8 does not give me a any better performance, neither with creating the array nor with calculating function over the array. Maybe because my CPU can just handly 32-bit / 64-bit chunks...

Upvotes: 1

Views: 938

Answers (1)

yevgeniy
yevgeniy

Reputation: 908

On my system it's int64, so yeah, it's just a default integer size. Why not bool you say. In the source code it uses scipy.hypergeom which returns integers. The only way I can think of is to pre-initialize your output arrays with dtype=np.bool, if you can. Then although you will still waste some memory on generation of stats.bernoulli.rvs, you can clear it up right after.

Upvotes: 1

Related Questions