CAPSLOCK
CAPSLOCK

Reputation: 6483

Optimizing code, removing "for loop"

I'm trying to remove outliers from a tick data series, following Brownlees & Gallo 2006 (if you may be interested).

The code works fine but given that I'm working on really long vectors (the biggest has 20m observations and after 20h it was not done computing) I was wondering how to speed it up.

What I did until now is:
I changed the time and date format to numeric double and I saw that it saves quite some time in processing and A LOT OF MEMORY.
I allocated memory for the vectors:

[n] = size(price);
 x = price;
    score = nan(n,'double');           %using tic and toc I saw that nan requires less time than zeros
    trimmed_mean = nan(n,'double');
    sd = nan(n,'double');
    out_mat = nan(n,'double');

Here is the loop I'd love to remove. I read that vectorizing would speed up a lot, especially using long vectors.

for i = k+1:n
    trimmed_mean(i) = trimmean(x(i-k:i-1 & i+1:i+k),10,'round');  %trimmed mean computed on the 'k' closest observations to 'i' (i is excluded)
    score(i) = x(i) - trimmed_mean(i);
    sd(i) = std(x(i-k:i-1 & i+1:i+k)); %same as the mean
    tmp = abs(score(i)) > (alpha .* sd(i) + gamma);
    out_mat(i) = tmp*1;
end

Here is what I was trying to do

trimmed_mean=trimmean(regroup_matrix,10,'round',2);
score=bsxfun(@minus,x,trimmed_mean);
sd=std(regroup_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
out_mat = temp*1;

But given that I'm totally new to Matlab, I don't know how to properly construct the matrix of neighbouring observations. I just think it should be shaped like: regroup_matrix= nan (n,2*k).

EDIT: To be specific, what I am trying to do (and I am not able to) is:
Given a column vector "x" (n,1) for each observation "i" in "x" I want to take the "k" neighbouring observations to "i" (from i-k to i-1 and from i+1 to i+k) and put these observations as rows of a matrix (n, 2*k).

EDIT 2: I made a few changes to the code and I think I am getting closer to the solution. I posted another question specific to what I think is the problem now:
Matlab: Filling up matrix rows using moving intervals from a column vector without a for loop

What I am trying to do now is:

[n] = size(price,1);
x = price;
[j1]=find(x);
matrix_left=zeros(n, k,'double');
matrix_right=zeros(n, k,'double');
toc
matrix_left(j1(k+1:end),:)=x(j1-k:j1-1);

matrix_right(j1(1:end-k),:)=x(j1+1:j1+k);

matrix_group=[matrix_left matrix_right];
trimmed_mean=trimmean(matrix_group,10,'round',2);
score=bsxfun(@minus,x,trimmed_mean);
sd=std(matrix_group,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;

I have problems with the matrix_left and matrix_right creation. j1, that I am using for indexing is a column vector with the indices of price's observations. The output is simply

j1=[1:1:n]

price is a column vector of double with size(n,1)

Upvotes: 4

Views: 214

Answers (2)

Jonas
Jonas

Reputation: 74940

For your reshape, you can do the following:

idxArray = bsxfun(@plus,(k:n)',[-k:-1,1:k]);
reshapedArray = x(idxArray);

Upvotes: 2

CAPSLOCK
CAPSLOCK

Reputation: 6483

Thanks to Jonas that showed me the way to go I came up with this:

idxArray_left=bsxfun(@plus,(k+1:n)',[-k:-1]);           %matrix with index of left neighbours observations
idxArray_fill_left=bsxfun(@plus,(1:k)',[1:k]);          %for observations from 1:k I take the right neighbouring observations, this way when computing mean and standard deviations there will be no problems.
matrix_left=[idxArray_fill_left; idxArray_left];        %Just join the two matrices and I have the complete matrix of left neighbours
idxArray_right=bsxfun(@plus,(1:n-k)',[1:k]);            %same thing as left but opposite.
idxArray_fill_right=bsxfun(@plus,(n-k+1:n)',[-k:-1]);
matrix_right=[idxArray_right; idxArray_fill_right];      
idx_matrix=[matrix_left matrix_right];                  %complete index matrix, joining left and right indices
neigh_matrix=x(idx_matrix);                             %exactly as proposed by Jonas, I fill up a matrix of observations from 'x', following idx_matrix indexing
trimmed_mean=trimmean(neigh_matrix,10,'round',2);
score=bsxfun(@minus,x,trimmed_mean);
sd=std(neigh_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;  

Again, thanks a lot to Jonas. You really made my day!
Thanks also to everyone that had a look to the question and tried to help!

Upvotes: 0

Related Questions