Reputation: 271
I have a multidimensional cell array attributes
(763x6 cell).attributes:
I have no syntax errors. The distance matrix D
that results from my code have the same values for each row. I don't know how to my distance function to be able to handle multiple rows / instances.
Sample of my data 5x6 cell:
'low back pain risk factor staff' 'low back pain' 'low back pain risk factor staff' 'back pain pain risk factor epidemiology' 'spiritual comment comment care be' 'spiritual comment comment care be'
'psd psd antipsychotic essential receptor' 'ht ht 5' 'antipsychotic protein signal receptor drug' 'cell protein signal cell receptor' 'spiritual comment comment care be' 'spiritual comment comment care be'
'school of medicine' 'case western reserve university' 'antidepressant action 5 for in' 'ht ht 5' 'spiritual comment comment care be' 'spiritual comment comment care be'
'spiritual comment comment care be' 'heal holistic comment india india' 'heal religious mental disorder psychiatric symptom' 'heal religious mental disorder psychiatric psychiatric' 'spiritual comment comment care be' 'spiritual comment comment care be'
Upvotes: 1
Views: 210
Reputation: 1880
The problem is with your distance function, which needs to be able to return multiple distances when given multiple rows in the second argument, as detailed in the table in the pdist2
documentation.
It also seems to be handling the cell arrays generated by regexp
wrong. By using cellfun
to pass cell arrays of words to intersect
, the intersect
function is being asked to compare the letters in different words.
I believe the following function returns values with the desired effect:
function D2 = intersection(XI,XJ)
wordsI = regexp(XI, '\s+', 'split');
wordsJ = regexp(XJ, '\s+', 'split');
D2 = zeros(size(XJ,1),1);
for i=1:numel(D2)
D2(i) = sum(cellfun(@(wI,wJ) numel(intersect(wI,wJ)), wordsI, wordsJ(i,:)));
end
Upvotes: 0
Reputation: 5419
This is not a solution, but is too long to fit in as a comment. The problem is in how pdist2
is calculating the pair-wise distances.
To quickly check this we can pass it a distance function which just prints out the XI
and XJ
arguments passed to it (when it is called from pdist2
):
X = {'foo1', 'foo2', 'foo3', 'foo4', 'foo5', 'foo6';...
'bar1', 'bar2', 'bar3', 'bar4', 'bar5', 'bar6'};
% call distance function via pdist2
D = pdist2(X,X,@printArgsIn);
And in a function file:
function D2 = printArgsIn(XI,XJ)
disp('XI'); disp(XI);
disp('XJ'); disp(XJ);
D2 = 1;
end
This returns the following:
XI
'foo1' 'foo2' 'foo3' 'foo4' 'foo5' 'foo6'
XJ
'foo1' 'foo2' 'foo3' 'foo4' 'foo5' 'foo6'
XI
'foo1' 'foo2' 'foo3' 'foo4' 'foo5' 'foo6'
XJ
'foo1' 'foo2' 'foo3' 'foo4' 'foo5' 'foo6'
'bar1' 'bar2' 'bar3' 'bar4' 'bar5' 'bar6'
XI
'bar1' 'bar2' 'bar3' 'bar4' 'bar5' 'bar6'
XJ
'foo1' 'foo2' 'foo3' 'foo4' 'foo5' 'foo6'
'bar1' 'bar2' 'bar3' 'bar4' 'bar5' 'bar6'
Ignoring the first XI, XJ
pair (if you look at pdist2 in detail you'll see distance function is called once to test it works), you can see that it calls the distance function on observation 1 of XI
against all observations of XJ
.
In other words it expects your distance function to be able to handle multiple rows/instances, and return a column vector of distances. I haven't looked at your distance function in detail, but I don't think you are allowing for this.
Upvotes: 1