Reputation: 65
I was reading a few papers about RNN networks. At some point, I came accross the following explanations:
The prediction model trained on sN is used to compute the error vectors for each point in the validation and test sequences. The error vectors are modelled to fit a multivariate Gaussian distribution N = N (μ, Σ). The likelihood p(t) of observing an error vector e(t) is given by the value of N at e(t) (similar to normalized innovations squared (NIS) used for novelty detection using Kalman filter based dynamic prediction model [5]). The error vectors for the points from vN1 are used to estimate the parameters μ and Σ using Maximum Likelihood Estimation.
And:
A Multivariate Gaussian Distribution is fitted to the error vectors on the validation set. y (t) is the probability of an error vector e (t) after applying Multivariate Gaussian Distribution N = N (µ, ±). Maximum Likelihood Estimation is used to select the parameters µ and Σ for the points from vN.
vN or vN1 are validaton datasets. sN is the training dataset.
They are from 2 different articles but describe the same thing. I didn't really understand what they mean by fitting a Multivariate Gaussian Distribution to the data. What does it mean?
Many thanks,
Guillaume
Upvotes: 2
Views: 1194
Reputation: 4990
Let's start with one dimensional data first. If you have a data distributed in a 1D line, they have a mean (µ) and variance (sigma). Then modeling them is as simple as having (µ, sigma)
to generate a new data point following your main distribution.
# Generating a new_point in a 1D Gaussian distribution
import random
mu, sigma = 1, 1.6
new_point = random.gauss(mu, sigma)
# 2.797757476598497
Now in N
dimensional space, multivariate normal distribution is a generalization of the one-dimensional. The objective in general is to find N
averages µ
and N x N
covariances this time noted by Σ
to model all data points in the N
dimensional space. Having them, you are able to generate as many random data points as you want following the main distributions. In Python/ Numpy, you can do it like:
import numpy as np
new_data_point = np.random.multivariate_normal(mean, covariance, 1)
Upvotes: 1