Mr.Robot
Mr.Robot

Reputation: 349

Generating synthetic data with Gaussian distribution

Problem

In a paper I am reading now, it defines a new metric and authors claim some advantages over previous metrics. They verify their claim by some synthetic data, which looks like following

enter image description here

The implementation of their metric is pretty straightforward. However, I am not sure how they create this kind of synthetic data.

What I Have Done

This looks like Gaussian where x is only within certain intervals, I tried with following code but did not get anything similar to the plot presented in the paper.

import numpy as np

def generate_gaussian(size=1000, lb=-0.1, up=0.1):
    data = np.random.randn(5000)
    data = data[(data <= up) & (data >= lb)][:size]
    return data

np.random.seed(1234)
base = generate_gaussian()
background_pos = base + 0.3
background_neg = base + 0.7

enter image description here

Now I am wondering if the authors create these data using some special distribution (other than Gaussian) I do not know?

Upvotes: 0

Views: 1722

Answers (3)

Alfonso
Alfonso

Reputation: 713

You can use scipy.stats.norm (info).

import libraries

>>> from scipy.stats import norm
>>> from matplotlib import pyplot

plot

>>> pyplot.hist(norm.rvs(loc=1, scale=0.5, size=10000), bins=30, alpha=0.5, label='norm_1')
>>> pyplot.hist(norm.rvs(loc=5, scale=0.5, size=10000), bins=30, alpha=0.5, label='norm_2')
>>> pyplot.legend()
>>> pyplot.show()

enter image description here


Clarification:

A normal distribution is defined by mean (loc, distribution center) and standard distribution (scale, measure of distribution dispersion or width). rvs generates random samples of the desired normal distribution of size size. For example next code generates 4 random elements of a normal distribution (mean = 1, SD = 1).

>>> norm.rvs(loc=1, scale=1, size=4)
array([ 0.52154255,  1.40873701,  1.55959291, -0.01730568])

Upvotes: 1

Alex
Alex

Reputation: 7065

Numpy has a numpy.random.normal that draws random samples from a normal (Gaussian) distribution.

import numpy as np
import matplotlib.pyplot as plt


sigma = 0.05
s0 = np.random.normal(0.2, sigma, 5000)
s1 = np.random.normal(0.6, sigma, 5000)

plt.hist(s0, 300, density=True, color="b")
plt.hist(s1, 300, density=True, color="r")
plt.xlim(0, 1)
plt.show()

Histograms

You can change the values of the mu (mean) and sigma to alter the distributions

mu = 0.55
sigma = 0.1
dist = np.random.normal(mu, sigma, 5000)

Upvotes: 2

Simon Notley
Simon Notley

Reputation: 2136

You have cut off the data at +/- 0.1. A normalised Gausian distribution only 'looks Gaussian' if you look over the range approximately +/- 3. Try this:

import numpy as np

def generate_gaussian(size=1000, lb=-3, up=3):
    data = np.random.randn(5000)
    data = data[(data <= up) & (data >= lb)][:size]
    return data

np.random.seed(1234)
base = generate_gaussian()
background_pos = base + 5
background_neg = base + 15

Upvotes: 1

Related Questions