Reputation: 193
Given two random variables X and Y, where X=(x1,..,xn) and Y=(y1,...,yn) in a nx2
matrix A, so A=[X Y]
, i need to perform the next operation:
median((x-median(x))(y-median(y)))
I'm trying to obtain an estimator of the covariance matrix using the median instead the mean, for a n
xt
matrix where t
represents the number of random variables and n
the length of the data set.
So far, I made the next code:
for i=1:n
for j=1:n
a1=median(A(:,i));
a2=median(A(:,j));
SMM(i,j)=median(((A(:,i)-a1(ones(t,1),:)).*(A(:,j)-a2(ones(t,1),:))));
end
end
However, theoretically I must obtain a semidefinite (positive or negative) symmetric matrix, however that's not the case with this code.
Am I making any mistake in the code formulation?
Upvotes: 1
Views: 32
Reputation: 60494
Various points:
For each of your columns of A
(x, y), the median (a1
, a2
) doesn't change. You should compute these outside the loops.
The loops go over n
, rather than t
, which are the variables and the indices to the output matrix.
I would first subtract the median from each column, to avoid repeatedly doing the same computations:
A = A - median(A,1); % be explicit about which dimension to take the median over!
Next, we'd loop over the t
xt
output elements of the covariance matrix, and compute each of the elements:
t = size(A,2);
SMM = zeros(t,t); % always preallocate output arrays before a loop
for j=1:t
for i=1:t
SMM(i,j) = median(A(:,i).*A(:,j));
end
end
The loop can likely be vectorized, but that leads to a large intermediate matrix, which slow down code also. So it might not be worth the effort to vectorize. Only try it if this code is too slow!
It should also be possible to run the inner loop from i=j:t
, to skip computing the redundant half of the symmetric matrix, instead copying over the previously computed values.
Upvotes: 1