guillaume mendlevitch
guillaume mendlevitch

Reputation: 65

Multivariate Normal Distribution fitting dataset

I was reading a few papers about RNN networks. At some point, I came accross the following explanations:

The prediction model trained on sN is used to compute the error vectors for each point in the validation and test sequences. The error vectors are modelled to fit a multivariate Gaussian distribution N = N (μ, Σ). The likelihood p(t) of observing an error vector e(t) is given by the value of N at e(t) (similar to normalized innovations squared (NIS) used for novelty detection using Kalman filter based dynamic prediction model [5]). The error vectors for the points from vN1 are used to estimate the parameters μ and Σ using Maximum Likelihood Estimation.

And:

A Multivariate Gaussian Distribution is fitted to the error vectors on the validation set. y (t) is the probability of an error vector e (t) after applying Multivariate Gaussian Distribution N = N (µ, ±). Maximum Likelihood Estimation is used to select the parameters µ and Σ for the points from vN.

vN or vN1 are validaton datasets. sN is the training dataset.

They are from 2 different articles but describe the same thing. I didn't really understand what they mean by fitting a Multivariate Gaussian Distribution to the data. What does it mean?

Many thanks,

Guillaume

Upvotes: 2

Views: 1194

Answers (1)

aminrd
aminrd

Reputation: 4990

Let's start with one dimensional data first. If you have a data distributed in a 1D line, they have a mean (µ) and variance (sigma). Then modeling them is as simple as having (µ, sigma) to generate a new data point following your main distribution.

# Generating a new_point in a 1D Gaussian distribution
import random

mu, sigma = 1, 1.6
new_point = random.gauss(mu, sigma)
# 2.797757476598497

Now in N dimensional space, multivariate normal distribution is a generalization of the one-dimensional. The objective in general is to find N averages µ and N x N covariances this time noted by Σ to model all data points in the N dimensional space. Having them, you are able to generate as many random data points as you want following the main distributions. In Python/ Numpy, you can do it like:

import numpy as np
new_data_point = np.random.multivariate_normal(mean, covariance, 1)

Upvotes: 1

Related Questions