piman314
piman314

Reputation: 5355

Formatting data for hmmlearn

I'm trying to fit a hidden Markov model using hmmlearn in python. I assume that my data is not formatted correctly, however the documentation is light for hmmlearn. Intuitively I would format the data as a 3 dimensional array of n_observations x n_time_points x n_features, but hmmlearn seems to want a 2d array.

import numpy as np
from hmmlearn import hmm
X = np.random.rand(10,5,3)
clf = hmm.GaussianHMM(n_components=3, n_iter=10)
clf.fit(X)

Which gives the following error:

ValueError: Found array with dim 3. Estimator expected <= 2.

Does anyone know how to format data in order to build the HMM I'm after?

Upvotes: 3

Views: 3637

Answers (2)

Vadim Smolyakov
Vadim Smolyakov

Reputation: 1197

In the case of a single time-series observation, the hmmlearn fit method expects the data to be in a 2-d column vector which can be obtained using reshape(-1,1):

X = np.array([1, 1, 0, -1, -1])
model = hmm.GaussianHMM(n_components=2, n_iter=100)
model.fit(X.reshape(-1,1))

Upvotes: 1

Sergei Lebedev
Sergei Lebedev

Reputation: 2679

Note: All of the following is relevant for the currently unreleased version 0.2.0 of hmmlearn. The version 0.1.0 available on PyPI uses a different API inherited from sklearn.hmm.

To fit the model to multiple sequences you have to provide two arrays:

  • X --- a concatenation of the data from all sequences,
  • lengths --- an array of sequence lengths.

I'll try to illustrate these conventions with an example. Consider two 1D sequences

X1 = [1, 2, 0, 1, 1]
X2 = [42, 42]

To pass both sequences to the .fit method we need to first concatenate them into a single array and then compute an array of lengths

X = np.append(X1, X2)
lengths = [len(X1), len(X2)]

Upvotes: 4

Related Questions