Trying to calcuate mean and std using float32 numpy arrays. Getting float64 returned

Question

[EDIT]

Okay my test case was poorly thought out. I only tested on 1-D arrays. in which case I get a 64bit scalar returned. If I do it on 3D array, I get the 32 bit as expected.

I am trying to calculate the mean and standard deviation of a very large numpy array (600*600*4044) and I am close to the limit of my memory (16GB on a 64bit machine). As such I am trying to process everything as a float32 rather than the float64 that is the default. However, any time I try to work on the data I get a float64 returned even if I specify the dtype as float32. why is this happening? Yes I can convert afterwards, but like I said I am close the to limit of my RAM and I am trying to keep everything as small as possible even during the processing step. Below is an example of what I am getting.

import scipy
a = scipy.ones((600,600,4044), dtype=scipy.float32)
print(a.dtype)

a_mean = scipy.mean(a, 2, dtype=scipy.float32)
a_std = scipy.std(a, 2, dtype=scipy.float32)

print(a_mean.dtype)
print(a_std.dtype)

Returns

float32
float32
float32

David Heffernan · Accepted Answer

Note: This answer applied to the original question

You have to switch to 64 bit Python. According to your comments your object has size 5.7GB even with 32 bit floats. That cannot fit in 32 bit address space which is 4GB, at best.

Once you've switched to 64 bit Python I think you can stop worrying about intermediate values using 64 bit floats. In fact you can quite probably perform your entire calculation using 64 bit floats.

If you are already using 64 bit Python (and your comments confused me on the matter), then you simply do not need to worry about scipy.mean or scipy.std returning a 64 bit float. That's one single value out of ~1.5 billion values in your array. It's nothing to worry about.

Note: This answer applies to the new question

The code in your question produces the following output:

float32
float32
float32

In other words, the symptoms that you report are not in fact representative of reality. The reason for the confusion is that you earlier code, that to which my original answer applied, was quite different and operated on a single dimensional array. It looks awfully like scipy returns scalars as float64. But when the return value is not a scalar, then the data type is not transformed in the way you thought.

Trying to calcuate mean and std using float32 numpy arrays. Getting float64 returned

Answers (2)

Related Questions