Brzoskwinia
Brzoskwinia

Reputation: 443

How to handle multiple sequences in training Hidden Markov Model with hmmlearn?

This question is also on Cross-Validated SE

Introduction

I'm working with time series data describing power consumption of 5 devices. My goal is to train a best fitting Hidden Markov Model for each device and do classification (i.e. give power consumption series and tell which device it was) based on likelihood scores of particular models. Observations come from 7 days:

enter image description here

Measurements are not continuous though, on some days they cover whole day while on others just a part, say 7 hours, for example this is data for one of devices split into days:

enter image description here

I have a problem understaning how to pass training data (suppose we take first 5 days for that and remaining 2 gonna be eval and test subsets) to model from hmmlearn library. Now it's done this way:

    model = hmm.GaussianHMM(n_components = n_hidden_states, n_iter = n_iter, random_state = seed)
    model.fit(train_dtf[device_name].to_numpy().reshape(-1,1))

where train_dtf.head(1) is like this: is enter image description here

and contains observations with day_index from 0 to 4. What I understand from hmmlearn docs is that data passed to model.fit is always one array, if it isn't it must be concatenated before. I'm not sure if it makes sense.

Questions

  1. Isn't it misleading to model to assume that this is one time series? I mean, especially if periods of observations differ among days. I have an intuition that we should indicate it somehow that at some point new day starts and thus new pattern starts.Is my intuition right?

  2. If so, how can I handle it? My first idea is to train model for one day, save parameters, then retrain model on the next day but using saved params as starting point, so on till all training days are used. I'm not confident about this solution though, because I can't explain why it should work.

  3. Maybe someone could propose any other method that is better for this particular task? I'm going to try DTW for sure, but I'm wondering if there are some other tools.

Upvotes: 1

Views: 1809

Answers (1)

Sycorax
Sycorax

Reputation: 1386

The hmmlearn library allows you to give multiple sequences. The documentation for fit lets you pass multiple sequences; you just have to tell fit where they start.

lengths (array-like of integers, shape (n_sequences, )) – Lengths of the individual sequences in X. The sum of these should be n_samples.

Suppose you have hourly data for 2 days: you have 2 * 24 = 48 observations. The argument lengths would be [24,24] to indicate this information to the model. Time step 25 doesn't use information from time step 24; instead, it's initialized from startprob_ just as for time step 0.

Upvotes: 2

Related Questions