Reputation: 301
I am normalizing a data set using the command
X=bsxfun(@times,bsxfun(@minus,X,min(X,[],1)),1./max(X,[],1))
I tried this function on two different data sets . One had negative values as well . The others didn't . The data set with no negative values got normalized perfectly between 0 and 1 . And the one with negative values was not normalized properly . Can this be fixed ? is there another way i can normalize the dataset with negative values ?
Upvotes: 0
Views: 692
Reputation: 119
Ok, this post really disturbed me.
I never heard of bsxfun. I was using arrayfun, cellfun, structfun. So I wandered why to use it , and I thought that in speed I would find my answer. So I did a stupid test:
X = magic(3);
tic
Y = bsxfun(@minus, X, min(X(:)));
X_normalized = bsxfun(@rdivide, Y, max(Y(:)));
toc
tic
arrayfun(@(x) x-min(X(:))./(max(X(:))-min(X(:))),X);
toc
And I got an answer:
Elapsed time is 0.004130 seconds.
Elapsed time is 0.002468 seconds.
, which made me thing that arrayfun was the way to go. But it could happen that arrayfun is only faster due to the fact that X is small data so I tried with a bigger X (X = magic(100);
). And Surely, bsxfun is much faster which means I'll need to recode some stuff.
Elapsed time is 0.003342 seconds.
Elapsed time is 0.395347 seconds.
However, not happy enough with the findings I decided to run the test several times just to make ensure it was not a casuality. And here is when it starts getting disturbing.
test= repmat({zeros(2,10)},2,1);
Xsizes = [3 100];
for ii=1:2,for jj=1:10
X = magic(Xsizes(ii));
tic
Y = bsxfun(@minus, X, min(X(:)));
X_normalized = bsxfun(@rdivide, Y, max(Y(:)));
test{ii}(1,jj)=toc;
tic
arrayfun(@(x) x-min(X(:))./(max(X(:))-min(X(:))),X);
test{ii}(2,jj)=toc;
end;end
display('small Size data')
test{1}
display('Big Size data')
test{2}
And the answer expecting to be allways faster arrayfun for small data and faster bsxfun for large data. However it is faster bsxfun in both cases and takes more time to calculate the first time of the set.
small Size data
ans =
1.0e-03 *
0.4900 0.0470 0.0430 0.0410 0.0410 0.0420 0.0420 0.0410 0.0420 0.0410
0.6600 0.4200 0.4040 0.3890 0.3920 0.3900 0.3920 0.3890 0.3960 0.3900
Big Size data
ans =
0.0003 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0001 0.0001 0.0001
0.3853 0.3871 0.3846 0.3855 0.3874 0.3844 0.3863 0.3840 0.3860 0.3853
This is puzzeling me. Even more if you calculate again with X=magic(3)
outside the for loop and bsxfun always takes more than arrayfun.
Elapsed time is 0.004891 seconds.
Elapsed time is 0.002008 seconds.
Elapsed time is 0.003181 seconds.
Elapsed time is 0.001994 seconds.
Elapsed time is 0.003109 seconds.
Elapsed time is 0.002008 seconds.
Any hints ?
Upvotes: 0
Reputation: 32920
The culprit lies in your normalization. You subtract min(X)
from X
and then divide by max(X)
, instead of dividing by max(X - min(X))
.
What you should be doing is breaking this into two steps:
Y = bsxfun(@minus, X, min(X));
X_normalized = bsxfun(@rdivide, Y, max(Y));
Note that this didn't worked properly anyway, neither for positive nor for negative values.
A few more notes:
min(X, [], 1)
can be shortened to min(X)
. The same goes for max
.times
in bsxfun
to multiply by 1 ./
max(Y)
, you can use rdivide
.Hope this helps!
Upvotes: 3