Lina
Lina

Reputation: 59

How to generate a random sequence given a probability matrix of transitions?

The script below produces a probability matrix for a given list:

transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']

def rank(c):
   return ord(c) - ord('A')

T = [rank(c) for c in transitions]

#create matrix of zeros

M = [[0]*4 for _ in range(4)]

for (i,j) in zip(T,T[1:]):
   M[i][j] += 1

#now convert to probabilities:
for row in M:
   n = sum(row)
   if n > 0:
       row[:] = [f/sum(row) for f in row]

#print M:
for row in M:
   print(row)

output

[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]

I now want to do the opposite, and make a new transition list of A B C D following the probability matrix.
How can I make this happen?

Upvotes: 3

Views: 1273

Answers (2)

JohanC
JohanC

Reputation: 80279

The choices function of the random library could be of help. As the question doesn't indicate how to choose the first letter, here it is chosen with the same probability as the contents of the original list.

Since Python 3.6 random.choices accepts a parameter with weights. It is not strictly necessary to normalize them.

import random

letter = random.choice(transitions)  # take a starting letter with the same weights as the original list
new_list = [letter]
for _ in range(len(transitions) - 1):
    letter = chr(random.choices(range(4), weights=M[rank(letter)])[0] + ord('A'))
    new_list.append(letter)
print(new_list)

The full code could be generalized somewhat to work with any kind of nodes, not just consecutive letters:

from _collections import defaultdict
import random

transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']

nodes = sorted(set(transitions))  # a list of all letters used
M = defaultdict(int)  # dictionary counting the occurrences for each transition i,j)

for (i, j) in zip(transitions, transitions[1:]):
    M[(i, j)] += 1

# dictionary with for each node a list of frequencies for the transition to a next node
T = {i: [M[(i, j)] for j in nodes] for i in nodes}

# node = random.choice(transitions) # chose the first node randomly with the same probability as the original list
node = random.choice(nodes) # chose the first node randomly, each node with equal probability
new_list = [node]
for _ in range(9):
    node = random.choices(nodes, T[node])[0]
    new_list.append(node)

print(new_list)

Example output:['D', 'A', 'D', 'A', 'D', 'D', 'A', 'D', 'A', 'B']

Upvotes: 1

Tim Stack
Tim Stack

Reputation: 3248

It seems to me you are attempting to create a Markov Model. I happen to have some experience with (Hidden) Markov Models as a bioinformatician student, and I would therefore use nested dictionaries to simplify working with the matrix. Note that I have imported the numpy.random function.

Hope this helps!

import numpy.random as rnd

alphabet = ['A', 'B', 'C', 'D']
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']

# Create probability matrix filled with zeroes
# Matrix consists of nested libraries
prob_matrix = {}
for i in alphabet:
    prob_matrix[i] = {}
    for j in alphabet:
        prob_matrix[i][j] = 0.0

def rank(c):
   return ord(c) - ord('A')

# fill matrix with numbers based on transitions list
T = [rank(c) for c in transitions]
for (i,j) in zip(T,T[1:]):
    prob_matrix[alphabet[i]][alphabet[j]] += 1

# convert to probabilities
for row in prob_matrix:
   total = sum([prob_matrix[row][column] for column in prob_matrix[row]])
   if total > 0:
       for column in prob_matrix[row]:
           prob_matrix[row][column] /= total

# generate first random sequence letter
outputseq = rnd.choice(alphabet, None)

# generate rest of string based on probability matrix
for i in range(11):
    probabilities = [prob_matrix[outputseq[-1]][j] for j in alphabet]
    outputseq += rnd.choice(alphabet, None, False, probabilities)

# output generated sequence
print(outputseq)

Upvotes: 1

Related Questions