happydude800
happydude800

Reputation: 43

MATLAB Basic Markov Chain implementation

I'm writing code simulate a very simple Markov Chain to generate 10000 6-nucleotide sequences from either of two transition matrices (i.e. if previous nucleotide was A, then use this set of probabilities to generate the next nucleotide, and so on). I've also got running products that I obtain by grabbing the corresponding probabilities from said matrices, but those aren't propagating correctly either.

I know that MATLAB may not treat strings/characters like other languages, so I'm not entirely sure what's going on with my code. Basically, I'm sampling, but it doesn't seem to be allocating correctly. My code is below. It's admittedly inelegant, but it should work...but it doesn't. Is there an easier way to do this, given those transition matrices? (Obviously, the code only works thus far for the first matrix (M1).)

clear all;
close all;

%length of sequences
n = 6;

%nucleotide map

N = 'ACGT';

%initial probabilities
%     A    C    G    T
M0 = [0.25 0.25 0.25 0.25];

%transition probabilities (if row, then col)
%inside CpG
%     A       C       G       T
M1 = [0.18    0.274   0.426   0.12    %A
      0.171   0.367   0.274   0.188   %C
      0.161   0.339   0.375   0.125   %G
      0.079   0.355   0.384   0.182]; %T

%outside CpG
%     A       C       G       T
M2 = [0.3     0.205   0.285   0.21    %A
      0.322   0.298   0.078   0.302   %C
      0.248   0.246   0.298   0.208   %G
      0.177   0.239   0.292   0.292]; %T

sampletype = input('Matrix Type (M1 or M2): ', 's');

sampseq = zeros(n,1);
PM1 = 1;
PM2 = 1;
count = 0;

if strcmp(sampletype,'M1')
    for i = 1:10000
        for position = 1:n
            if position == 1
                firstnuc = randsample(N,1,true,M0);
                sampseq(1) = firstnuc;

                PM1 = PM1*0.25;
                PM2 = PM2*0.25;

            %check for previous nucleotide
            elseif position ~= 1
                if strcmp(sampseq(position-1),'A')
                    tempnuc = randsample(N,1,true,M1(1,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(1,findstr(N,tempnuc));
                    PM2 = PM2*M2(1,findstr(N,tempnuc));
                elseif strcmp(sampseq(position-1),'C')
                    tempnuc = randsample(N,1,true,M1(2,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(2,findstr(N,tempnuc));
                    PM2 = PM2*M2(2,findstr(N,tempnuc));
                elseif strcmp(sampseq(position-1),'G')
                    tempnuc = randsample(N,1,true,M1(3,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(3,findstr(N,tempnuc));
                    PM2 = PM2*M2(3,findstr(N,tempnuc));
                elseif strcmp(sampseq(position-1),'T')
                    tempnuc = randsample(N,1,true,M1(4,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(4,findstr(N,tempnuc));
                    PM2 = PM2*M2(4,findstr(N,tempnuc));
                end
            end
        end

        if PM2 > PM1
            count = count + 1;
        end
        PM1
        PM2
        PM1 = 1;
        PM2 = 1;
        sampseq
        sampseq = zeros(n,1);
    end
end

count

Upvotes: 1

Views: 986

Answers (1)

happydude800
happydude800

Reputation: 43

Ah, I fixed it. To get around the bit-designations created by the 'ACGT' string, I just created placeholders:

%    A C G T
N = [1 2 3 4]

The indexing fix was trivial after that.

I am still interested in figuring out how to generate "real" sequences though, as I was originally trying to do, instead of just strings of combinations of 1s, 2s, 3s, and 4s.

Upvotes: 1

Related Questions