Reputation: 489
Suppose I want to predict if a person is of class1=healthy
or of class2= fever
. I have a data set with the following domain: {normal,cold,dizzy}
The transition matrix would contain the probability of transition generated from our training dataset while the initial vector would contain the probability that a person starts(day1) with a state x from the domain {normal,cold,dizzy}
, again this is also generated from our training set.
If I want to build a first order markov chain, I would generate a 3x3 transition matrix and a 1x3 initial vector per class like so:
> TransitionMatrix
normal cold dizzy
normal NA NA NA
cold NA NA NA
dizzy NA NA NA
>Initial Vector
normal cold dizzy
[1,] NA NA NA
The NA will be filled with the corresponding probabilities.
1-My question is about transition matrices in higher order chain. For example in second order MC would we have a transition matrix of size domain²xdomain²
like so:
normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
normal->normal NA NA NA NA NA NA NA NA NA
normal->cold NA NA NA NA NA NA NA NA NA
normal->dizzy NA NA NA NA NA NA NA NA NA
cold->normal NA NA NA NA NA NA NA NA NA
cold->cold NA NA NA NA NA NA NA NA NA
cold->dizzy NA NA NA NA NA NA NA NA NA
dizzy->normal NA NA NA NA NA NA NA NA NA
dizzy->cold NA NA NA NA NA NA NA NA NA
dizzy->dizzy NA NA NA NA NA NA NA NA NA
here the cell (1,1)
represents the following sequence: normal->normal->normal->normal
or would it instead be just domain²xdomain
like so:
normal cold dizzy
normal->normal NA NA NA
normal->cold NA NA NA
normal->dizzy NA NA NA
cold->normal NA NA NA
cold->cold NA NA NA
cold->dizzy NA NA NA
dizzy->normal NA NA NA
dizzy->cold NA NA NA
dizzy->dizzy NA NA NA
here the cell (1,1)
represents normal->normal->normal
which is different from the previous representation
2-What about the initial vector for a MC of degree 2. Would we need two initial vectors of size 1xdomain
like so:
normal cold dizzy
[1,] NA NA NA
leading to two initial vectors per class. the first giving the probability of occurrence of {normal,cold,dizzy}
on the first day for the healthy/fever
class while the second gives the probability of occurrence on the second day for the healthy/fever
. this would give 4 initial vectors.
OR would we just need one initial vector of size 1xdomain²
like so:
normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
[1,] NA NA NA NA NA NA NA NA NA
I can see how the second way of representing the initial vector would be problematic in case we want to classify an observation with only one state.
Upvotes: 7
Views: 5481
Reputation: 76391
Say the set of spaces is S. Typically, in the nth order,
The transition matrix has dimensions |S|n X |S|. This is because given the current n history of states, we need the probability of the single next state. It is true that this single next state induces another compound state of history n, but the transition itself is to the single next state. See this example in Wikipedia, e.g..
The initial distribution is a distribution over |S|n elements (your second option).
Upvotes: 3