Reputation: 41
I was trying to use the R markovchain package.
I have a question regarding the markovchainFit function and the sequence matrix.
By default the markovchainFit function is run with the sequence of states as the parameter. Then it is said in the documentation that this function changes that sequence into the sequence matrix, which can be retrieved using the createSequenceMatrix function.
My question is - can the markovchainFit be somehow run with the sequence matrix as the parameter (or at least with the vecotr of multiple data sequences)?
I'm asking because in my model I have multiple absorbing states. That means an example sequence may be short as it will end with the absorbing state. I have multiple sequences in my dataset and I'm able to create a sequence matrix based on them. Nevertheless I don't have one long sequence which can be used as the parameter for the markovchainFit (as each sequence is absorbed after couple of states).
Terminology in my question is based on the following documentation: CRAN Introduction to markovchain package
In the weather example in that article a simple scenario is introduced. There are 3 states (sunny,cloudy,rain) and the transition matrix is given as in input:
sunny cloudy rain
sunny 0.7 0.20 0.10
cloudy 0.3 0.40 0.30
rain 0.2 0.45 0.35
Based on that matrix a markov chain object is built:
R> weatherMatrix <- matrix(data = c(0.70, 0.2, 0.1,
+ 0.3, 0.4, 0.3,
+ 0.2, 0.45, 0.35), byrow = byRow, nrow = 3,
+ dimnames = list(weatherStates, weatherStates))
R> mcWeather <- new("markovchain", states = weatherStates, byrow = byRow,
+ transitionMatrix = weatherMatrix, name = "Weather")
Then a sequence of data is generated from the markov chain - in order to demonstrate how to fit the model back from that sample:
R> weathersOfDays <- rmarkovchain(n = 365, object = mcWeather, t0 = "sunny")
Then a new Markov chain is fitted based on the data:
R> weatherFittedLAPLACE <- markovchainFit(data = weathersOfDays,
+ method = "laplace", laplacian = 0.01,
+ name = "Weather LAPLACE")
R> weatherFittedLAPLACE$estimate
The estimated results are given below, to show how the data is close to the original transition matrix:
cloudy rain sunny
cloudy 0.3944786 0.32110428 0.2844171
rain 0.4050361 0.37972922 0.2152347
sunny 0.1932057 0.07958871 0.7272056
It is said that the fitting is based on the 'sequence matrix', which can be retrieved as follows:
R> createSequenceMatrix(stringchar = weathersOfDays)
cloudy rain sunny
cloudy 43 35 31
rain 32 30 17
sunny 34 14 128
My problem is that I have the data in form of multiple sequences as there are many absorbing states and the chains are relatively short.
I would like to feed them and have the model fitted, however the package allows for a single sequence of the data being fed. Alternatively, I can construct the sequence matrix as shown above ad feed it to the model, but I don't see a function in the package, that could handle it.
Long story short - I have multiple short data sequences, based on which I want a markov chain model to be fitted.
Upvotes: 4
Views: 1936
Reputation: 2503
There is an example that answer your request, at least partially. The holson 'data.frame' is actually a matrix where rows are life trajectories and columns time sequences and by running
singleMc<-markovchainFit(data=holson[,2:12],name="holson")
a transition matrix is fit. This requires the sequence length to be even. In your example, i suppose you could have uneven length maybe if you stop recording the sequence when an absorbing state is reached. So you have to repeat the last state for that row until the max length of the life history is reached
Upvotes: 1