Reputation: 349

Generating synthetic data with Gaussian distribution

Problem

In a paper I am reading now, it defines a new metric and authors claim some advantages over previous metrics. They verify their claim by some synthetic data, which looks like following

The implementation of their metric is pretty straightforward. However, I am not sure how they create this kind of synthetic data.

What I Have Done

This looks like Gaussian where x is only within certain intervals, I tried with following code but did not get anything similar to the plot presented in the paper.

import numpy as np

def generate_gaussian(size=1000, lb=-0.1, up=0.1):
    data = np.random.randn(5000)
    data = data[(data <= up) & (data >= lb)][:size]
    return data

np.random.seed(1234)
base = generate_gaussian()
background_pos = base + 0.3
background_neg = base + 0.7

Now I am wondering if the authors create these data using some special distribution (other than Gaussian) I do not know?

Upvotes: 0

Answers (3)

Alfonso

Reputation: 713

You can use scipy.stats.norm (info).

import libraries

>>> from scipy.stats import norm
>>> from matplotlib import pyplot

plot

>>> pyplot.hist(norm.rvs(loc=1, scale=0.5, size=10000), bins=30, alpha=0.5, label='norm_1')
>>> pyplot.hist(norm.rvs(loc=5, scale=0.5, size=10000), bins=30, alpha=0.5, label='norm_2')
>>> pyplot.legend()
>>> pyplot.show()

Clarification:

A normal distribution is defined by mean (loc, distribution center) and standard distribution (scale, measure of distribution dispersion or width). rvs generates random samples of the desired normal distribution of size size. For example next code generates 4 random elements of a normal distribution (mean = 1, SD = 1).

>>> norm.rvs(loc=1, scale=1, size=4)
array([ 0.52154255,  1.40873701,  1.55959291, -0.01730568])

Upvotes: 1

Alex

Reputation: 7065

Numpy has a numpy.random.normal that draws random samples from a normal (Gaussian) distribution.

import numpy as np
import matplotlib.pyplot as plt


sigma = 0.05
s0 = np.random.normal(0.2, sigma, 5000)
s1 = np.random.normal(0.6, sigma, 5000)

plt.hist(s0, 300, density=True, color="b")
plt.hist(s1, 300, density=True, color="r")
plt.xlim(0, 1)
plt.show()

You can change the values of the mu (mean) and sigma to alter the distributions

mu = 0.55
sigma = 0.1
dist = np.random.normal(mu, sigma, 5000)

Upvotes: 2

Simon Notley

Reputation: 2136

You have cut off the data at +/- 0.1. A normalised Gausian distribution only 'looks Gaussian' if you look over the range approximately +/- 3. Try this:

import numpy as np

def generate_gaussian(size=1000, lb=-3, up=3):
    data = np.random.randn(5000)
    data = data[(data <= up) & (data >= lb)][:size]
    return data

np.random.seed(1234)
base = generate_gaussian()
background_pos = base + 5
background_neg = base + 15

Upvotes: 1

Generating synthetic data with Gaussian distribution

Problem

What I Have Done

Answers (3)

Related Questions