Reputation: 407
I want to compare these two strings by r-contiguous matching rule. So in this example if we set r as 6 then it will return true for the first example and false for the second example.
Example 1:
A='ABCDEFGHIJKLM'
B='XYZ0123EFGHIJAB'
return true (since it they both have 6 contiguous match 'EFGHIJ')
Example 2:
A='ABCDEFGHJKLM'
B='XYZ0123EFGHAB'
return false (since they both have only 4 contiguous match 'EFGH')
What is the fastest way in MATLAB since my data is huge? Thanks.
Upvotes: 3
Views: 132
Reputation: 221614
Case : Input strings with unique characters
Here's one approach with ismember
& strfind
-
matches = ismember(A,B) %// OR any(bsxfun(@eq,A,B.'),1)
matches_ext = [0 matches 0]
starts = strfind(matches_ext,[0 1])
stops = strfind(matches_ext,[1 0])
interval_lens = stops - starts
out = any(interval_lens >= r)
Here's another with diff
& find
instead of strfind
-
matches = ismember(A,B) %// OR any(bsxfun(@eq,A,B.'),1)
matches_ext = [0 matches 0]
df = diff(matches_ext)
interval_lens = find(df == -1) - find(df == 1)
out = any(interval_lens >= r)
Here's another with 1D convolution
-
matches = ismember(A,B) %// OR any(bsxfun(@eq,A,B.'),1)
out = any(conv(double(matches),ones(1,r)) == r)
Case : Input strings with non-unique characters
Here's one approach using bsxfun
-
matches = bsxfun(@eq,A,B.'); %//'
intv = (0:r-1)*(size(matches,1)+1)+1
idx = find(matches)
idx = idx(idx <= max(idx) - max(intv))
out = any(all(matches(bsxfun(@plus,idx,intv)),2))
Upvotes: 4