Ha Hacker
Ha Hacker

Reputation: 407

r-contiguous matching, MATLAB

I want to compare these two strings by r-contiguous matching rule. So in this example if we set r as 6 then it will return true for the first example and false for the second example.

Example 1:

A='ABCDEFGHIJKLM'
B='XYZ0123EFGHIJAB'
return true (since it they both have 6 contiguous match 'EFGHIJ')

Example 2:

A='ABCDEFGHJKLM'
B='XYZ0123EFGHAB'
return false (since they both have only 4 contiguous match 'EFGH')

What is the fastest way in MATLAB since my data is huge? Thanks.

Upvotes: 3

Views: 132

Answers (1)

Divakar
Divakar

Reputation: 221614

Case : Input strings with unique characters

Here's one approach with ismember & strfind -

matches = ismember(A,B) %// OR any(bsxfun(@eq,A,B.'),1)
matches_ext = [0 matches 0]

starts = strfind(matches_ext,[0 1])
stops = strfind(matches_ext,[1 0])    
interval_lens = stops - starts

out = any(interval_lens >= r)

Here's another with diff & find instead of strfind -

matches = ismember(A,B) %// OR any(bsxfun(@eq,A,B.'),1)
matches_ext = [0 matches 0]

df = diff(matches_ext)
interval_lens = find(df == -1) - find(df == 1)

out = any(interval_lens >= r)

Here's another with 1D convolution -

matches = ismember(A,B) %// OR any(bsxfun(@eq,A,B.'),1)
out = any(conv(double(matches),ones(1,r)) == r)

Case : Input strings with non-unique characters

Here's one approach using bsxfun -

matches = bsxfun(@eq,A,B.');  %//'
intv = (0:r-1)*(size(matches,1)+1)+1
idx = find(matches)
idx = idx(idx <= max(idx) - max(intv))
out = any(all(matches(bsxfun(@plus,idx,intv)),2))

Upvotes: 4

Related Questions