Reputation: 99
I am trying to train multiple GMM model with different training words. Then I am trying to test my models with an unseen testing word, I am getting negative values. Any idea what I am doing wrong?
from python_speech_features import mfcc
from python_speech_features import delta
from sklearn.mixture import GaussianMixture
import pandas as pd
import scipy.io.wavfile as wav
import os, glob
import numpy as np
Reading all training files
rate = []#reading rates and signals of all Training wav files
sig = []
for filename in glob.glob('Data\Training\*.wav'):
sr_value, x_value = wav.read(filename)
rate.append(sr_value)
sig.append(x_value)
Calculating mfcc for each signal
all_mfcc_feat = []
for audio in sig:
#defaults
all_mfcc_feat.append(mfcc(signal = audio, samplerate = 16000, winlen = 0.025, winstep = 0.01, nfilt=26, nfft = 512, numcep = 13, preemph = 0.97, ceplifter=22, appendEnergy =False))
Calculating deltas for each signal
delta_oneT = []
double_deltaT = []
for mfcc in all_mfcc_feat:
delta1 = (delta(mfcc, 2))
delta_oneT.append(delta1) #calculating delta
double_deltaT.append(delta(delta1, 2)) #calculating double delta from previous delta
training_feat = []
for i in range (0, len(all_mfcc_feat)): #iterate through signals
df = pd.DataFrame(data = None, )
for j in range (0, len(all_mfcc_feat[i])): #iterate through list of mfcc's
combined = np.concatenate([all_mfcc_feat[i][j],delta_oneT[i][j], double_deltaT[i][j] ])
df = df.append(pd.Series(combined), ignore_index = True)
dfnew = df.values
training_feat.append(dfnew)
(sr_valueX, x_valueX) = wav.read('Data\Testing\wiehedT.wav')
mfcc_test = mfcc(x_valueX, sr_valueX)
delta_oneTest = []
double_deltaTest = []
delta1T = delta(mfcc_test, 2)
delta_oneTest.append(delta1T) #calculating delta
double_deltaTest.append(delta(delta1, 2)) #calculating double delta from previous delta
df = pd.DataFrame(data = None, )
for i in range (0, len(mfcc_test)):
combined = np.concatenate([mfcc_test[i],delta_oneTest[0][i],double_deltaTest[0][i]])
df = df.append(pd.Series(combined), ignore_index = True)
testingFeat = df.values
allmodels = []
for feat in training_feat:
gmm = GaussianMixture() #default weights and means
gmm.fit(feat)
allmodels.append(gmm)
i = 1
for gmm in allmodels:
print 'Model ',i
scores = gmm.score(testingFeat)
print scores
i = i+1
Upvotes: 0
Views: 3996
Reputation: 2966
The code works as expected. The function gmm.score(testingFeat)
returns the computed log probabilities of each point in the input data. Here is the documentation of score()
Log probabilities are simply the logarithm of the probabilities -which belong to the interval (0,1)- and so they are negative. To reverse this you can apply the exponential function as in this post.
However this won't result in percentage/probabilities like scores because your data might not be uniformly distributed. This is explained in more details here.
Upvotes: 1