Luan Vieira
Luan Vieira

Reputation: 117

HMM R package Error in if (d < delta) { : missing value where TRUE/FALSE needed

I am trying to use HMM package in R.

I want 4 hidden states, and my observed values range from 2 to 15.

I can initialize the hidden model without any issues:

if (!require(HMM, quietly = TRUE)) {
  install.packages("HMM")
  library(HMM)
} else {
  library(HMM)
}

# Load data
url <- "https://raw.githubusercontent.com/luancvieira/HMM/main/ottawa_2010-2012.csv"
df <- read.csv(url)

observed_data <- df$AvgTemperature

# Define number and names of states
n_states <- 4
state_names <- paste0("State", 1:n_states)

# Sorting symbols
symbol_names <- as.character(sort(unique(observed_data)))
observed_data <- as.character(observed_data)

# Initialize the HMM model with random probabilities
start_probs <- runif(n_states)
trans_probs <- matrix(runif(n_states * n_states),
               nrow = n_states, ncol = n_states)
emission_probs <- matrix(runif(n_states * length(symbol_names)), 
                  nrow = n_states, ncol = length(symbol_names))

# Normalize rows to ensure probabilities sum to 1
start_probs <- start_probs / sum(start_probs)
trans_probs <- trans_probs / rowSums(trans_probs)
emission_probs <- emission_probs / rowSums(emission_probs)

# Initialize the HMM model
hmm_model <- initHMM(States = state_names,
                     Symbols = symbol_names,
                     startProbs = start_probs,
                     transProbs = trans_probs,
                     emissionProbs = emission_probs)

# Print the initialized model
print(hmm_model)

enter image description here

However, when I try to run it:

bw = baumWelch(hmm = hmm_model,observation = observed_data,
               maxIterations = 100, delta = 0.001)

I get

Error in if (d < delta) { : missing value where TRUE/FALSE needed.

If I reduce the number of iterations to 10, it runs without issues, but 10 iterations is not enough to converge. delta is a stop criteria for the algorithm, as specified here: https://cran.r-project.org/web/packages/HMM/HMM.pdf. The default value for delta is 1e-9, so even if no delta is specified, it still returns an error when run with more iterations.

Upvotes: 0

Views: 157

Answers (1)

Th&#233;o Michelot
Th&#233;o Michelot

Reputation: 186

Hard to know exactly why the error arises, but my guess is that the fitting algorithm isn't converging. This could be because:

  1. there are too many parameters to estimate from too little data. You are trying to estimate 12 transition probabilities and around 50 emission probabilities, from around 1000 data points. This is a difficult problem, for a model that includes a latent component.

  2. the initial parameter values might not be appropriate. Given the complexity of the optimisation required to estimate all parameters, it is crucial to provide carefully-chosen starting parameter values, rather than random values. You would need to think about the expected proportion of each symbol in each state (e.g., state 1 could have high probabilities for "2", "3" and "4", but low probabilities for the other symbols, etc.), and define emissionProbs accordingly. Likewise, the transition probabilities could be initialised based on the expectation that there is state persistence (i.e., the diagonal elements are probably close to 1).

That being said, for continuous data such as temperature, it is probably preferable to use a continuous distribution (e.g., normal, gamma, etc.). This means you wouldn't have to treat the temperature values as qualitative symbols, and you could instead model a distribution over continuous temperature measurements. I don't think the package HMM allows for continuous distributions, as the CRAN page describes it as a package for "discrete time and discrete space Hidden Markov Models". There are several other R packages for HMMs that allow continuous observation distributions; I include code for the package depmixS4 below.

Model fitting

Here I am using a 4-state HMM with Gaussian state-dependent distributions.

library(depmixS4)

# Load data
url <- "https://raw.githubusercontent.com/luancvieira/HMM/main/ottawa_2010-2012.csv"
df <- read.csv(url)
df$date <- as.Date(paste0(df$Year, "-", df$X..Month, "-", df$Day))

# Create model
mod <- depmix(AvgTemperature ~ 1,
              data = df, 
              nstates = 4,
              family = gaussian())

# Fit model
fit <- fit(mod)

Estimated parameters

After fitting the model, you can find the estimated parameters for the hidden state process (transition probabilities) and the observation model (mean and SD of temperature in each state):

> summary(fit, which = "all")
Initial state probabilities model 
pr1 pr2 pr3 pr4 
  0   1   0   0 

Transition matrix 
        toS1  toS2  toS3  toS4
fromS1 0.973 0.000 0.027 0.000
fromS2 0.000 0.960 0.000 0.040
fromS3 0.035 0.000 0.927 0.038
fromS4 0.000 0.043 0.043 0.914

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.sd
St1          13.292  1.066
St2           3.474  1.345
St3           9.898  1.267
St4           6.753  1.186

Plot results

You can also get the most likely state sequence (predicted using the Viterbi algorithm), with the function viterbi. The code below plots the time series of data coloured by state.

library(ggplot2)

# Get most likely state sequence
df$viterbi <- factor(viterbi(fit)[,1])
ggplot(df, aes(date, AvgTemperature, col = viterbi, group = NA)) +
    geom_point(size = 0.5) +
    geom_line()

enter image description here

Upvotes: 1

Related Questions