wildcolor
wildcolor

Reputation: 574

generate a 'normal-distribution' like data based on one value in python

I have one var temp, say temp = 100. What I want to do is to generate 8 data points. These 8 points are displayed like shown in the figure. It looks like normal-distribution but I want to add lots of random values in these points so that they do not look like a perfect normal distribution. The final data (the area under the curve) should be summed to temp. Could someone advise how to do this easily and neatly in Python please?

I have tried to use the distribution function in numpy/matplot. However, I wonder how I can get 8 points like shown in the figure (x = 0,1,2,3,4...)? Also I can't figure out how I can sum them to 100?

enter image description here

Upvotes: 1

Views: 1547

Answers (1)

Sebastian Wozny
Sebastian Wozny

Reputation: 17506

By imposing the sum temp=100 you introduce a dependency between at least two data points, making it impossible to create a set of independently sampled random data points.

This answer on mathworks provides more detailed information.

An easier example:

Imagine one coin flip. The randomness in the system is exactly one binary outcome, or 1 bit.

Imagine two coin flips. The randomness in the system is exactly two binary outcomes or 2 bit.

Now imagine imposing a sum constraint on two coin flips, let's say you want the sum of coin flips in the system to equal exactly 1. Since the outcome of the second coin flip is determined by the outcome of the first binary decision, the randomness in the system shrinks.

Therefore you reduce the total randomness of the system from 2 bit to 1 bit.

Sampling 8 truly (pseudo)-random points from a normal distribution with a sum-constraint is therefore not possible.

Your best bet would be to sample 7 random points from a distribution with appropriate mean and then add a point to the dataset to absorb the difference:

>>> import numpy as np
>>> temp = 100.0
>>> datapoints = 8
>>> dev = 1
>>> data = np.random.normal(temp/datapoints, dev, datapoints-1)
>>> print(data)
[ 11.70369328  10.77010243  11.20507387  12.40637644  12.81099137
  12.55329521  10.95809056]
>>> data = np.append(data,temp-sum(data))
>>> data
array([ 11.70369328,  10.77010243,  11.20507387,  12.40637644,
        12.81099137,  12.55329521,  10.95809056,  17.59237685])
>>> sum(data)
100.0

Upvotes: 1

Related Questions