Reputation: 75
I have created a matlab program to find word bigrams and their frequencies in a text file. For this purpose I have created a cell array of strings using textread function:
unigrams = textread('file.txt','%s');
But I also wish to omit a bunch of words like 'to', 'the', 'is', 'or', etc and special characters '#', '$', '&' and '%' from my cell array. Is there a way to exclude these words while reading the words from the raw file.
Thanks.
Upvotes: 0
Views: 1337
Reputation:
You can use setdiff
after reading the text to remove the unwanted words:
unigrams = {'I' 'like' 'this' 'or' 'that' 'Here' 'are' 'some' 'symbols' '#' '$' '&'}
setdiff(unigrams, {'the', 'is' 'or' '#' '$' '&'}, 'stable')
unigrams =
Columns 1 through 8
'I' 'like' 'this' 'or' 'that' 'Here' 'are' 'some'
Columns 9 through 12
'symbols' '#' '$' '&'
ans =
'I' 'like' 'this' 'that' 'Here' 'are' 'some' 'symbols'
Upvotes: 1