user42967
user42967

Reputation: 99

Gaussian Mixture Model gives negative value scores

I am trying to train multiple GMM model with different training words. Then I am trying to test my models with an unseen testing word, I am getting negative values. Any idea what I am doing wrong?

from python_speech_features import mfcc
from python_speech_features import delta
from sklearn.mixture import GaussianMixture 
import pandas as pd
import scipy.io.wavfile as wav
import os, glob
import numpy as np

Reading all training files

rate = []#reading rates and signals of all Training wav files
sig = []
for filename in glob.glob('Data\Training\*.wav'):
    sr_value, x_value = wav.read(filename)
    rate.append(sr_value)
    sig.append(x_value)

Calculating mfcc for each signal

all_mfcc_feat = []
for audio in sig:
    #defaults
    all_mfcc_feat.append(mfcc(signal = audio, samplerate = 16000, winlen = 0.025, winstep = 0.01, nfilt=26, nfft = 512, numcep = 13, preemph = 0.97, ceplifter=22, appendEnergy =False))

Calculating deltas for each signal

delta_oneT = []
double_deltaT = []
for mfcc in all_mfcc_feat:
    delta1 = (delta(mfcc, 2))
    delta_oneT.append(delta1) #calculating delta
    double_deltaT.append(delta(delta1, 2)) #calculating double delta from previous delta

training_feat = []
for i in range (0, len(all_mfcc_feat)): #iterate through signals
    df = pd.DataFrame(data = None, )

    for j in range (0, len(all_mfcc_feat[i])): #iterate through list of mfcc's
        combined = np.concatenate([all_mfcc_feat[i][j],delta_oneT[i][j], double_deltaT[i][j] ])
        df = df.append(pd.Series(combined), ignore_index = True)
    dfnew = df.values
    training_feat.append(dfnew)


(sr_valueX, x_valueX) = wav.read('Data\Testing\wiehedT.wav')

mfcc_test = mfcc(x_valueX, sr_valueX)

delta_oneTest = []
double_deltaTest = []
delta1T = delta(mfcc_test, 2)
delta_oneTest.append(delta1T) #calculating delta
double_deltaTest.append(delta(delta1, 2)) #calculating double delta from previous delta


df = pd.DataFrame(data = None, )
for i in range (0, len(mfcc_test)):  
    combined = np.concatenate([mfcc_test[i],delta_oneTest[0][i],double_deltaTest[0][i]])
    df = df.append(pd.Series(combined), ignore_index = True) 
testingFeat = df.values

allmodels = []
for feat in training_feat:
    gmm = GaussianMixture() #default weights and means
    gmm.fit(feat)
    allmodels.append(gmm)

i = 1
for gmm in allmodels:
    print 'Model ',i
    scores = gmm.score(testingFeat)
    print scores
    i = i+1

Upvotes: 0

Views: 3996

Answers (1)

SuperKogito
SuperKogito

Reputation: 2966

The code works as expected. The function gmm.score(testingFeat) returns the computed log probabilities of each point in the input data. Here is the documentation of score()

Log probabilities are simply the logarithm of the probabilities -which belong to the interval (0,1)- and so they are negative. To reverse this you can apply the exponential function as in this post.

However this won't result in percentage/probabilities like scores because your data might not be uniformly distributed. This is explained in more details here.

Upvotes: 1

Related Questions