Reputation: 837
Here is my problem, I'm trying to teach a Hidden Markov Models using hmmlearn. I'm new to the language, and I have some difficulties to understand the differences between lists and arrays. Here is my code:
from hmmlearn import hmm
from babel import lists
import numpy as np
import unidecode as u
from numpy import char
l = []
data = []
gods_egypt = ["Amon","Anat","Anouket","Anubis","Apis","Atoum","Bastet","Bès","Gheb","Hâpy","Harmachis","Hathor","Heh","Héket","Horus","Isis","Ka","Khepri","Khonsou","Khnoum","Maât","Meresger","Mout","Nefertoum","Neith","Nekhbet","Nephtys","Nout","Onouris","Osiris","Ouadjet","Oupaout","Ptah","Rê","Rechef","Renenoutet","Satet","Sebek","Sekhmet","Selkis","Seth","Shou","Sokaris","Tatenen","Tefnout","Thot","Thouéris"]
for i in range(0, len(gods_egypt)):
data.append([])
for j in range(0, len(gods_egypt[i])):
data[i].append([u.unidecode(gods_egypt[i][j].lower())])
l.append(len(data[i]))
data = np.asarray(data).reshape(-1,1)
model = hmm.MultinomialHMM(20, verbose=True)
model = model.fit(data, l)
and the resulting output
Traceback (most recent call last):
File "~~~\HMM_test.py", line 17, in <module>
model = model.fit(data, l)
File "~~~\Python\Python36\site-packages\hmmlearn\base.py", line 420, in fit
X = check_array(X)
File "~~~\Python36-32\lib\site-packages\sklearn\utils\validation.py", line 402, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
I have seen at ValueError: setting an array element with a sequence that it might be a problem of different array length, but I can't figure out how to solve it.
Any suggestion ?
Upvotes: 1
Views: 1632
Reputation: 3645
The error itself comes from the fact that model.fit()
is expecting an array of arrays of numerical values. Right now your input data
is an array of arrays of list of lists of string. This is what provokes an error as the function finds that the array element
that it is expecting is a sequence
i.e., the list (of lists of strings).
However, even if you fix the list issue, another issue will arise:
Learning an HMM implies computing numerical quantities via some set of equations. The input data to learn an HMM should be numerical, not a set of letters. (Except if hmmlearn
has a very special option for characters that I am not aware of.)
You need to first transform the letters into numbers if you want to work with HMMs.
I do not know what you end goal is. HMM are aimed at modeling data for generation or classification purpose (if several HMMs are trained). What are you intending to do once you have a trained model from the letters composing the words?
As for the format in which the data should be provided to the different functions, I suggest that you give a look at the documentation. It includes tutorials for the use of the library.
Upvotes: 2