Reputation: 21
I am trying to train HMM model to find model parameters for Part of Speech tagging problem.
I am using PythonHMM package from following resource: https://github.com/jason2506/PythonHMM
Original training data could be like this:
Sr.No. Observations
1 killer/N clown/N
2 killer/N problem/N
3 crazy/A problem/N
4 crazy/A clown/N
5 problem/N crazy/A clown/N
6 clown/N crazy/A killer/N
I have created a list of each sequence (a list of (state list, symbol list) pair) from original data, as instructed to use for train model through PythonHMM. It looks like this:
sequences = [
(['N','N'],['killer','clown']),
(['N','N'],['killer','problem']),
(['A','N'],['crazy','problem']),
(['A','N'],['crazy','clown']),
(['N','A','N'],['problem','crazy','clown']),
(['N','A','N'],['clown','crazy','killer'])
]
I am calling 'train' function of hmm (after imported hmm.py)
model_hmm = hmm.train(sequences)
then I am getting following error:
ValueError Traceback (most recent call last)
<ipython-input-41-24d7c607e58c> in <module>()
----> 1 model_hmm = hmm.train(sequences)
/home/sk/hmm.py in train(sequences, delta, smoothing)
95 for _, symbol_list in sequences:
96 model.learn(symbol_list, smoothing)
---> 97 new_likelihood += log(model.evaluate(symbol_list))
98
99 new_likelihood /= length
ValueError: math domain error
I could not able to figure out why this error comes, Is there any problem in passing sequences data to train function or something else??
I also didn't find any example for training of HMM model for such type of problem. Please help me to resolve this error.
Upvotes: 2
Views: 2750
Reputation: 1930
The hmmlearn implementation already support train HMM with multiple sequences of observations. see train hmm with multiple sequences
Upvotes: 3
Reputation: 281
nltk library has HMM model which does exactly what you are trying to do.
see the following link for better understanding: https://gist.github.com/blumonkey/007955ec2f67119e0909
Upvotes: 0