iAmWanteD
iAmWanteD

Reputation: 303

Separating a string for different substrings

Assuming I have a string: akobabyd, how can I make an array of its substrings every 3 chars without using a for loop? Expected output: ako kob oba bab aby byd

*This is NOT homework, just a step I need to think of on the way towards solution.

Thanks

Upvotes: 1

Views: 79

Answers (3)

rayryeng
rayryeng

Reputation: 104503

If you can use built-in functions, you can use hankel to generate a windowing sequence where you can extract three characters at a time and place them into a 2D matrix where each row is a 3 character sequence. In general, supposing you wanted to find len substrings (in our case, len = 3), therefore if we did:

len = 3;
ind = hankel(1:len, len:length(s))

We would get:

ind =

     1     2     3     4     5     6
     2     3     4     5     6     7
     3     4     5     6     7     8

You can see that each column has indices that are three elements long, and have one position overlapping in between the windows. Therefore, we would just use these indices to access the corresponding characters in our string and produce a 2D array of characters. However, we want to have rows of strings, and so we need to transpose this result, then access our string.

Therefore:

s = 'akobabyd';
len = 3;
subseqs = s(hankel(1:len, len:length(s)).')

subseqs =

ako
kob
oba
bab
aby
byd

This could can generalize to whichever length of substring you want. Just change len.

As such, to access a particular row idx, you would just do:

t = subseqs(idx,:);

Edit

You said you wanted to do this without using hankel. Looking at the hankel source, this is what we get:

function H = hankel(c,r)

r = r(:);                       %-- force column structure
nr = length(r);

x = [ c; r((2:nr)') ];          %-- build vector of user data

cidx = (ones(class(c)):nc)';
ridx = zeros(class(r)):(nr-1);
H = cidx(:,ones(nr,1)) + ridx(ones(nc,1),:);  % Hankel subscripts
H(:) = x(H);                            % actual data

You can see that it only uses ones and zeros, as well as class to ensure that whatever data we get in is what comes out. We can simplify this as we know only numeric data (specifically double) is coming in. Therefore, the simplified version of the Hankel script, as well as extracting those characters you want would be:

s = 'akobabyd'; %// Define string here

%// Hankel starts here
c = (1 : len).'; 
r = (len : length(s)).';
nr = length(r);
nc = length(c);

x = [ c; r((2:nr)') ];          %-- build vector of user data

cidx = (1:nc)';
ridx = 0:(nr-1);
H = cidx(:,ones(nr,1)) + ridx(ones(nc,1),:);  % Hankel subscripts
ind = x(H);                            % actual data
%// End Hankel script

%// Now get our data
subseqs = s(ind.');

Upvotes: 3

Luis Mendo
Luis Mendo

Reputation: 112679

One-line solution with the mighty bsxfun function:

s = 'akobabyd'; %// input string
n = 3; %// number of chars of each substring
result = s(bsxfun(@plus, 1:n, (0:(numel(s)-n)).'));

Upvotes: 2

Marcin
Marcin

Reputation: 238219

What about this one:

A = 'akobabyd';

C = arrayfun(@(ii) A(ii-1:ii+1), [2:numel(A)-1] , 'UniformOutput', 0);
C(:)

ans = 

    'ako'
    'kob'
    'oba'
    'bab'
    'aby'
    'byd'

Upvotes: 2

Related Questions