Reputation: 33
So given an struct array, Index with fields, Word, Documents, Locations, it takes a cell array of char arrays and indexes it into Index and also records the DocNums of documents that index appears in.
function Index = InsertDoc(Index, newDoc, DocNum)
for i = 1:numel(newDoc)
contains = any(strcmpi(newDoc(i),[Index.Word]));
if any(contains);
curr = find(strcmpi(newDoc(i),[Index.Word]),true);
Index(curr).Documents{1} = unique([Index(curr).Documents{1},DocNum]);
if (numel(Index(curr).Documents{1}) ~= numel(Index(curr).Locations))
Index(curr).Locations{end+1} = [i];
else
Index(curr).Locations{end} = [Index(curr).Locations{end},i];
end
else
curr = numel(Index) + 1;
Index(curr).Word = [newDoc(i)];
Index(curr).Documents = {DocNum};
Index(curr).Locations = {[i]};
end
end
end
For example
Doc1 = {'Matlab', 'is', 'awesome'};
Doc2 = {'Programming', 'is', 'very', 'very', 'fun'};
Doc3 = {'I', 'love', 'Matlab','very','much};
someIndex = InitializeIndex;
% InitializeIndex just creates struct array with the given fields and empty cell arrays
someIndex = InsertDoc(someIndex, Doc1, 1);
someIndex = InsertDoc(someIndex, Doc2, 2);
someIndex = InsertDoc(someIndex, Doc3, 3);
The result would be for someIndex(1)
Word: 'Matlab'
Documents: [1 3]
Locations: {[1] [3]}
someIndex(2)
Word: 'is'
Documents: [1 2]
Locations: {[2] [2]}
someIndex(5)
Word: 'very'
Documents: [2 3]
Locations: {[3 4] [4]}
I need to be able to run this with a struct array of 20000 elements with a variety of words, and right now it takes and absurd amount of time to finish indexing. How can I improve this algorithm?
Upvotes: 0
Views: 69
Reputation: 341
Try to allocate memory for your cell array "Index" before your loop starts.
Upvotes: 1