user1552372
user1552372

Reputation: 111

MHSMM package in R-Input Format?

I'm tring to use the MHSMM package to estimate parameters of a hidden markov model using multiple observation sequences.

But for the function hmmfit(x), what would be the format of x, I tried using matrix, a list of list, but the method hmmfit(x) is not working properly saying that x is not numeric.

Can anyone give an example on how to use this package to estimate HMM parameters? I have a csv file where each row is a sequence of observations and I have multiple rows in the csv file.

Thanks a lot!

Upvotes: 3

Views: 1175

Answers (1)

PaRb
PaRb

Reputation: 31

I wrote this function to create the right data format:

formatMhsmm <- function(data){

  nb.sequences = nrow(data)
  nb.observations = length(data)

  #transform list to data frame
  data_df <- data.frame(matrix(unlist(data), nrow = nb.sequences, byrow=F))


  #iterate over these in loops
  rows <- 1:nb.sequences
  observations <- 0:(nb.observations-1)

  #build vector with id values
  id = numeric(length = nb.sequences*nb.observations ) 

  for(i in rows)
  {
    for (j in observations)
    {
      id[i+j+(i-1)*(nb.observations-1)] = i
    }
  }

  #build vector with observation values
  sequences = numeric(length = nb.sequences*nb.observations) 

  for(i in rows)
  {
    for (j in observations)
    {
      sequences[i+j+(i-1)*(nb.observations-1)] = data_df[i,j+1]
    }
  }

  data.df = data.frame(id, sequences)

  #creation of hsmm.data object needed for training
  N <- as.numeric(table(data.df$id))
  train <- list(x = data.df$sequences, N = N)
  class(train) <- "hsmm.data"

  return(train)
}

Basically, what you need in the hsmm.data format, is an ID that shows how long each sequence is, and the corresponding sequence. These are in a list, and then you assign the "hsmm.data" format, so that hmmfit can recognize it.

Then you can call it like that, I gave some initial estimates for the HMM parameters, that you can adjust to your needs:

library(mhsmm)

dataset <- read.csv('file.csv',header=TRUE)
train <- formatMhsmm(dataset)

# 4 states HMM    
J=4
#init probabilities
init <- rep(1/J, J)

#transition matrix
P <- matrix(rep(1/J, J*J), nrow = J)

#emission matrix:  here I used a Gaussian distribution, replace muEst and sigmaEst by your initial estimates of mean and variance
b <- list(mu = muEst, sigma = sigmaEst) 

#starting model for EM
startmodel <- hmmspec(init = init, trans = P, parms.emis = b, dens.emis = dnorm.hsmm)

#EM algorithm fits an HMM to the data
hmm <- hmmfit(train, startmodel, mstep = mstep.norm,maxit = 100)

#print resulting HMM parameters
summary(hmm)

A paper where you can find some more information is: O’Connell, Jared, and Søren Højsgaard. "Hidden semi markov models for multiple observation sequences: The mhsmm package for R." Journal of Statistical Software 39.4 (2011): 1-22.

It's a late answer, but hope it can help someone. Cheers

Upvotes: 3

Related Questions