Reputation: 629
Imagine I have a series of 4 possible Markovian states (A, B, C, D):
X = [A, B, B, C, B, A, D, D, A, B, A, D, ....]
How can I generate a Markov transformation matrix using Python? The matrix must be 4 by 4, showing the probability of moving from each state to the other 3 states. I've been looking at many examples online but in all of them, the matrix is given, not calculated based on data. I also looked into hmmlearn but nowhere I read on how to have it spit out the transition matrix. Is there a library that I can use for this purpose?
Here is an R code for the exact thing I am trying to do in Python: https://stats.stackexchange.com/questions/26722/calculate-transition-matrix-markov-in-r
Upvotes: 24
Views: 54065
Reputation: 1440
In Pandas there is a much easier solution: pd.crosstab. Given your sequence:
X = ["A", "B", "B", "C", "B", "A", "D", "D", "A", "B", "A", "D"]
matrix = pd.crosstab(
pd.Series(X[:-1], name='from'),
pd.Series(X[1:], name='to'),
normalize=0
)
Resulting in the following pd.DataFrame:
to A B C D
from
A 0.0 0.50 0.00 0.5
B 0.5 0.25 0.25 0.0
C 0.0 1.00 0.00 0.0
D 0.5 0.00 0.00 0.5
If you want a np.array
instead, use matrix.to_numpy()
which results in:
[[0. 0.5 0. 0.5 ]
[0.5 0.25 0.25 0. ]
[0. 1. 0. 0. ]
[0.5 0. 0. 0.5 ]]
Upvotes: 5
Reputation: 190
Thank you @john-coleman , I have updated your code using numpy:
import numpy as np
def transition_matrix(transitions):
n = 1+ max(transitions) #number of states
M = np.zeros((n,n))
for (i,j) in zip(transitions,transitions[1:]):
M[i][j] += 1
#now convert to probabilities:
M = M/M.sum(axis=1, keepdims=True)
return M
t = [1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
m = transition_matrix(t)
for row in m: print(' '.join(f'{x:.2f}' for x in row))
The output is the same:
0.67 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.50 0.12 0.12 0.25 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
0.00 0.20 0.00 0.00 0.20 0.60 0.00 0.00 0.00
0.17 0.17 0.00 0.00 0.17 0.33 0.00 0.17 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.33 0.00 0.00 0.00 0.33 0.00 0.00 0.33
Upvotes: 0
Reputation: 21
The following code provides another solution about Markov transition matrix order 1. Your data can be list of integers, list of strings, or a string. The negative think is that this solution -most likely- requires time and memory.
Until here we have the solution of the question. The following code try to solve an additional problem. Specifically, the generating data according to the trained Markov task.
import pandas as pd
def transition_matrix_order1(data):
alphabet = []
for element in data:
if element not in alphabet:
alphabet.append(element)
alphabet.sort()
previous = data[0]
matrix = pd.DataFrame(0.0, index=alphabet, columns=alphabet)
for i in data[1:]:
matrix[i][previous] += 1.0
previous = i
total = matrix.sum()
for element in alphabet:
matrix[element] = matrix.div(total[element])[element]
return matrix, alphabet
#create data using random integers========
import random
data = [random.randint(1,5) for i in range(1000)] #You can also put list of strings or a string as input data
#create markov transition matrix order 1 (bigram)
markov_matrix, alphabet = transition_matrix_order1(data)
#=the following code uses the probabilities in order to create new data.=
#transform probabilities of markov transition matrix to cumulative
for column in alphabet:
for pos, index in enumerate(alphabet[1:]):
markov_matrix[column][index] += markov_matrix[column][alphabet[pos]]
#generating 30 data
generated_data = []
feed = random.choice(alphabet)
generated_data.append(feed)
for i in range(30):
random_value = random.uniform(0, 1)
for i in alphabet:
if markov_matrix[feed][i] >= random_value:
generated_data.append(i)
feed = i
break
print(generated_data)
Upvotes: 2
Reputation: 507
If you want to do it all in pandas, here is an approach that works for non numeric data:
import pandas as pd
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
df = pd.DataFrame(transitions)
# create a new column with data shifted one space
df['shift'] = df[0].shift(-1)
# add a count column (for group by function)
df['count'] = 1
# groupby and then unstack, fill the zeros
trans_mat = df.groupby([0, 'shift']).count().unstack().fillna(0)
# normalise by occurences and save values to get transition matrix
trans_mat = trans_mat.div(trans_mat.sum(axis=1), axis=0).values
It's slower than the pure python approach but maybe worth it for flexibility and to avoid creating your own function.
Upvotes: 16
Reputation: 51988
This might give you some ideas:
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
def rank(c):
return ord(c) - ord('A')
T = [rank(c) for c in transitions]
#create matrix of zeros
M = [[0]*4 for _ in range(4)]
for (i,j) in zip(T,T[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
n = sum(row)
if n > 0:
row[:] = [f/sum(row) for f in row]
#print M:
for row in M:
print(row)
output:
[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]
On Edit Here is a function which implements the above ideas:
#the following code takes a list such as
#[1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
#with states labeled as successive integers starting with 0
#and returns a transition matrix, M,
#where M[i][j] is the probability of transitioning from i to j
def transition_matrix(transitions):
n = 1+ max(transitions) #number of states
M = [[0]*n for _ in range(n)]
for (i,j) in zip(transitions,transitions[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
s = sum(row)
if s > 0:
row[:] = [f/s for f in row]
return M
#test:
t = [1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
m = transition_matrix(t)
for row in m: print(' '.join('{0:.2f}'.format(x) for x in row))
Output:
0.67 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.50 0.12 0.12 0.25 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
0.00 0.20 0.00 0.00 0.20 0.60 0.00 0.00 0.00
0.17 0.17 0.00 0.00 0.17 0.33 0.00 0.17 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.33 0.00 0.00 0.00 0.33 0.00 0.00 0.33
Upvotes: 34