Reputation: 443
This question is also on Cross-Validated SE
I'm working with time series data describing power consumption of 5 devices. My goal is to train a best fitting Hidden Markov Model for each device and do classification (i.e. give power consumption series and tell which device it was) based on likelihood scores of particular models. Observations come from 7 days:
Measurements are not continuous though, on some days they cover whole day while on others just a part, say 7 hours, for example this is data for one of devices split into days:
I have a problem understaning how to pass training data (suppose we take first 5 days for that and remaining 2 gonna be eval and test subsets) to model from hmmlearn
library. Now it's done this way:
model = hmm.GaussianHMM(n_components = n_hidden_states, n_iter = n_iter, random_state = seed)
model.fit(train_dtf[device_name].to_numpy().reshape(-1,1))
where train_dtf.head(1)
is like this:
is
and contains observations with day_index
from 0 to 4. What I understand from hmmlearn docs is that data passed to model.fit
is always one array, if it isn't it must be concatenated before. I'm not sure if it makes sense.
Isn't it misleading to model to assume that this is one time series? I mean, especially if periods of observations differ among days. I have an intuition that we should indicate it somehow that at some point new day starts and thus new pattern starts.Is my intuition right?
If so, how can I handle it? My first idea is to train model for one day, save parameters, then retrain model on the next day but using saved params as starting point, so on till all training days are used. I'm not confident about this solution though, because I can't explain why it should work.
Maybe someone could propose any other method that is better for this particular task? I'm going to try DTW for sure, but I'm wondering if there are some other tools.
Upvotes: 1
Views: 1809
Reputation: 1386
The hmmlearn
library allows you to give multiple sequences. The documentation for fit
lets you pass multiple sequences; you just have to tell fit
where they start.
lengths (array-like of integers, shape (n_sequences, )) – Lengths of the individual sequences in
X
. The sum of these should ben_samples
.
Suppose you have hourly data for 2 days: you have 2 * 24 = 48 observations. The argument lengths
would be [24,24] to indicate this information to the model. Time step 25 doesn't use information from time step 24; instead, it's initialized from startprob_
just as for time step 0.
Upvotes: 2