Reputation: 43
I'm writing code simulate a very simple Markov Chain to generate 10000 6-nucleotide sequences from either of two transition matrices (i.e. if previous nucleotide was A, then use this set of probabilities to generate the next nucleotide, and so on). I've also got running products that I obtain by grabbing the corresponding probabilities from said matrices, but those aren't propagating correctly either.
I know that MATLAB may not treat strings/characters like other languages, so I'm not entirely sure what's going on with my code. Basically, I'm sampling, but it doesn't seem to be allocating correctly. My code is below. It's admittedly inelegant, but it should work...but it doesn't. Is there an easier way to do this, given those transition matrices? (Obviously, the code only works thus far for the first matrix (M1).)
clear all;
close all;
%length of sequences
n = 6;
%nucleotide map
N = 'ACGT';
%initial probabilities
% A C G T
M0 = [0.25 0.25 0.25 0.25];
%transition probabilities (if row, then col)
%inside CpG
% A C G T
M1 = [0.18 0.274 0.426 0.12 %A
0.171 0.367 0.274 0.188 %C
0.161 0.339 0.375 0.125 %G
0.079 0.355 0.384 0.182]; %T
%outside CpG
% A C G T
M2 = [0.3 0.205 0.285 0.21 %A
0.322 0.298 0.078 0.302 %C
0.248 0.246 0.298 0.208 %G
0.177 0.239 0.292 0.292]; %T
sampletype = input('Matrix Type (M1 or M2): ', 's');
sampseq = zeros(n,1);
PM1 = 1;
PM2 = 1;
count = 0;
if strcmp(sampletype,'M1')
for i = 1:10000
for position = 1:n
if position == 1
firstnuc = randsample(N,1,true,M0);
sampseq(1) = firstnuc;
PM1 = PM1*0.25;
PM2 = PM2*0.25;
%check for previous nucleotide
elseif position ~= 1
if strcmp(sampseq(position-1),'A')
tempnuc = randsample(N,1,true,M1(1,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(1,findstr(N,tempnuc));
PM2 = PM2*M2(1,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'C')
tempnuc = randsample(N,1,true,M1(2,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(2,findstr(N,tempnuc));
PM2 = PM2*M2(2,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'G')
tempnuc = randsample(N,1,true,M1(3,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(3,findstr(N,tempnuc));
PM2 = PM2*M2(3,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'T')
tempnuc = randsample(N,1,true,M1(4,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(4,findstr(N,tempnuc));
PM2 = PM2*M2(4,findstr(N,tempnuc));
end
end
end
if PM2 > PM1
count = count + 1;
end
PM1
PM2
PM1 = 1;
PM2 = 1;
sampseq
sampseq = zeros(n,1);
end
end
count
Upvotes: 1
Views: 986
Reputation: 43
Ah, I fixed it. To get around the bit-designations created by the 'ACGT' string, I just created placeholders:
% A C G T
N = [1 2 3 4]
The indexing fix was trivial after that.
I am still interested in figuring out how to generate "real" sequences though, as I was originally trying to do, instead of just strings of combinations of 1s, 2s, 3s, and 4s.
Upvotes: 1