Reputation: 61
How to get Standard Deviation from each components in sklearn GMM after fit?
model.fit(dataSet)
model.means_ is the means of each components.
model.weights_ is the co-efficient of each components.
Where I can find the deviations of each Gaussian components?
Thanks,
Upvotes: 6
Views: 5736
Reputation: 326
You can get variance in the diagonal of the covariance matrix: first diagonal element is sigma_x and second is sigma_y.
Basically if you have N mixtures and C is your gaussian mixture instance :
cov = C.covariances_
[ np.sqrt( np.trace(cov[i])/N) for i in range(0,N) ]
will give you the mean std deviation of each mixture.
I checked with this simulation below, and it seems to converge around 1% of the real values with hundreds or thousands of points :
# -*- coding: utf-8 -*-
"""
Created on Wed Jul 24 12:37:38 2019
- - -
Simulate two point - gaussian normalized - distributions.
Use GMM cluster fit and look how covariance elements are related to sigma.
@author: Adrien MAU / ISMO & Abbelight
"""
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import sklearn
from sklearn import cluster, mixture
colorsList = ['c','r','g']
CustomCmap = matplotlib.colors.ListedColormap(colorsList)
sigma1=16
sigma2=4
npoints = 2000
s = (100,100)
x1 = np.random.normal( 50, sigma1, npoints )
y1 = np.random.normal( 70, sigma1, npoints )
x2 = np.random.normal( 20, sigma2, npoints )
y2 = np.random.normal( 50, sigma2, npoints )
x = np.hstack((x1,x2))
y = np.hstack((y1,y2))
C = mixture.GaussianMixture(n_components= 2 , covariance_type='full' )
subdata = np.transpose( np.vstack((x,y)) )
C.fit( subdata )
m = C.means_
w = C.weights_
cov = C.covariances_
print('\n')
print( 'test var 1 : ' , np.sqrt( np.trace( cov[0]) /2 ) )
print( 'test var 2 : ' , np.sqrt( np.trace( cov[1]) /2 ) )
plt.scatter(x1,y1)
plt.scatter(x2,y2)
plt.scatter( m[0,0], m[0,1])
plt.scatter( m[1,0], m[1,1])
plt.title('Initial data, and found Centroid')
plt.axis('equal')
gmm_sub_sigmas = [ np.sqrt( np.trace(cov[i])/2) for i in range(0,2) ]
xdiff= (np.transpose(np.repeat([x],2 ,axis=0)) - m[:,0]) / gmm_sub_sigmas
ydiff= (np.transpose(np.repeat([y],2 ,axis=0)) - m[:,1]) / gmm_sub_sigmas
# distances = np.hypot(xdiff,ydiff) #not the effective distance for gaussian distributions...
distances = 0.5*np.hypot(xdiff,ydiff) + np.log(gmm_sub_sigmas) # I believe this is a good estimate of closeness to a gaussian distribution
res2 = np.argmin( distances , axis=1)
plt.figure()
plt.scatter(x,y, c=res2, cmap=CustomCmap )
plt.axis('equal')
plt.title('GMM Associated data')
Upvotes: 3
Reputation: 9
model.covariances_
will give you covariance information.
The return covariance is dependent on the covariance_type
, which is a parameter of the GMM.
For example, if covariance_type = 'diag'
, the return covariance is a [pxq] matrix, where p
represents the number of Gaussian components, and q
is the number of dimensions of the input.
Please refer to http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html for more information.
Upvotes: 0