Reputation: 14316
I need to calculate the variance of a large vector which is stored as uint8
. The MATLAB var
function however only accepts double
and single
types as input. The easiest way to calculate the variance would therefore be
vec = randi(255,1,100,'uint8');
var(single(vec))
This of course gives the correct result. However using single
datatype increses the memory usage by a factor of 4. For large vectors (~ 1 million elements) this will quickly fill up the memory.
What I tried: The definition of the variance for a discrete random variable X is
(Source: Wikipedia)
I estimated the p
's using the histogram, but then got stuck: To calculate the variance in a vectorized fashion, I would need to convert the x_i
's to single
or double
.
Is there any possibility to calculate the variance without converting the whole vector to single
or double
?
Upvotes: 2
Views: 850
Reputation: 35525
No. The value of the variance is going to be a floating point value most likely, so you need to perform floating point operations.
p_i
itself is the Probability mass function, so sum(p_i)
should be one, therefore each p_i
is a floating point number.
In addition, nu
, the mean, will probably not be integer neither
Upvotes: 1
Reputation: 1104
If you're willing to work with uint16, you can do this, it creates only 3 floating point numbers (var and the 2 means), use Var(X)=Mean(X^2)-Mean(X)^2:
uivec=uint16(vec);
mean(uivec.^2)-mean(uivec)^2
So, not as good as keeping uint8 but still twice better than converting to single. It should work with uint16 because your input is uint8 and (2^8)^2=2^16.
If you want the exact same answer as var, you need to remember that MATLAB uses the unbiased estimator for var (it divides the sum by n-1
instead of n
, where n
is your number of samples) so you need to do:
n=length(vec);
v=mean(uivec.^2)-mean(uivec)^2*(n/(n-1))
then your v
will be exactly equal to var(single(vec))
.
Upvotes: 3