Reputation: 291
I want to calculate the probability of several sequences in a Markov Chain. I got the Markov Chain ready, but I am not sure how to easily calculate specific sequence probabilities.
My pandas dataframe with A-E on the left as the index and A-E on the top as columns is called Markov, looks as follows:
A B C D E
A 0.3 0.2 0.5 0.0 0.2
B 0.2 0.4 0 0 0.4
C 0.5 0.4 0 0.1 0
D 0.2 0.2 0.2 0.2 0.2
E 0.6 0.1 0.1 0.1 0.1
let's assume I want to check the probability of the sequence called sequence: ['A', 'C', 'D']. Which would mean the transition A to C, C to D. It should result in 0.05.
I succeeded by using the pandas .at function:
markov.at[sequence[0], sequence[1]] * markov.at[sequence[1], sequence[2]].
However, I would like to build a function that when I hand it a table of sequences on each row which vary in length, it calculates the corresponding sequence probabilities. In my approach, I have to manually alter the code each time I want to check a specific sequence.
How could I achieve this? Am I overlooking a building feature of pandas to perform such calculations?
Upvotes: 2
Views: 1677
Reputation: 150785
You could define a function like this:
def get_prob(*args):
ret = 1
for i, j in zip(args, args[1:]):
ret *= markov.at[i,j]
return ret
And then call:
get_prob('A','C','D')
# 0.05
get_prob('A', 'C', 'D', 'E')
# 0.010000000000000002
Or you can do:
def get_prob2(lst):
ret = 1
for i,j in zip(lst, lst[1:]):
ret *= markov.at[i,j]
return ret
so you could pass a string (or a list):
get_prob2('ACDE')
# 0.010000000000000002
Upvotes: 1