Reputation: 255
I have implemented cosine similarity in Matlab like this. In fact, I have a two-dimensional 50-by-50 matrix. To obtain a cosine should I compare items in a line by line form.
for j = 1:50
x = dat(j,:);
for i = j+1:50
y = dat(i,:);
c = dot(x,y);
sim = c/(norm(x,2)*norm(y,2));
end
end
Is this correct? and The question is this: wath is the complexity or O(n) in this state?
Upvotes: 0
Views: 2800
Reputation: 5014
Just a note on an efficient implementation of the same thing using vectorized and matrix-wise operations (which are optimized in MATLAB). This can have huge time savings for large matrices:
dat = randn(50, 50);
OP (double-for) implementation:
sim = zeros(size(dat));
nRow = size(dat,1);
for j = 1:nRow
x = dat(j, :);
for i = j+1:nRow
y = dat(i, :);
c = dot(x, y);
sim(j, i) = c/(norm(x,2)*norm(y,2));
end
end
Vectorized implementation:
normDat = sqrt(sum(dat.^2, 2)); % L2 norm of each row
datNorm = bsxfun(@rdivide, dat, normDat); % normalize each row
dotProd = datNorm*datNorm'; % dot-product vectorized (redundant!)
sim2 = triu(dotProd, 1); % keep unique upper triangular part
Comparisons for 1000 x 1000 matrix: (MATLAB 2013a, x64, Intel Core i7 960 @ 3.20GHz)
Elapsed time is 34.103095 seconds.
Elapsed time is 0.075208 seconds.
sum(sum(sim-sim2))
ans =
-1.224314766369880e-14
Upvotes: 2
Reputation: 74
Better end with 49. Maybe you should also add an index to sim?
for j = 1:49
x = dat(j,:);
for i = j+1:50
y = dat(i,:);
c = dot(x,y);
sim(j) = c/(norm(x,2)*norm(y,2));
end
end
The complexity should be roughly like o(n^2), isn't it? Maybe you should have a look at correlation functions ... I don't get what you want to write exactly, but it looks like you want to do something similar. There are built-in correlation functions in Matlab.
Upvotes: 1