Reputation: 447
I just extracted a alignment from my model at a frame level.
fash-b-an251 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 134 134 134 134 134 134 134 134 134 44 44 44 44 44 44 44 44 44 111 111 111 111 111 111 111 111 111 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Each phone class is defined in the data/lang/phones.txt file, which according to this phones can be seperated into X , X_B ,X_I, X_E, X_S
in which _B phoneme begining _E phoneme ending, _S phoneme singleton, _I phoneme intermediate , X just a phoneme.
I was under the impression that each phoneme was decoded using a tri stated hmm, and therefore thought that it would be possible to decode phone at a frame level given three frames => three set of features => three sequenced set of emission probabilities => determining phonemes.
But this doesn't seem to be the case, so the feature must contain information from static, delta, delta-deltas.
If this is the case, is it possible to extract the expected posterior probabilities for the three states for each phoneme?
And is it possible given a one set of features (enough to decode a phoneme), decode it to a phoneme given a premade script?
Upvotes: 3
Views: 188