Reputation: 574
I have one var temp, say temp = 100. What I want to do is to generate 8 data points. These 8 points are displayed like shown in the figure. It looks like normal-distribution but I want to add lots of random values in these points so that they do not look like a perfect normal distribution. The final data (the area under the curve) should be summed to temp. Could someone advise how to do this easily and neatly in Python please?
I have tried to use the distribution function in numpy/matplot. However, I wonder how I can get 8 points like shown in the figure (x = 0,1,2,3,4...)? Also I can't figure out how I can sum them to 100?
Upvotes: 1
Views: 1547
Reputation: 17506
By imposing the sum temp=100
you introduce a dependency between at least two data points, making it impossible to create a set of independently sampled random data points.
This answer on mathworks provides more detailed information.
An easier example:
Imagine one coin flip. The randomness in the system is exactly one binary outcome, or 1 bit.
Imagine two coin flips. The randomness in the system is exactly two binary outcomes or 2 bit.
Now imagine imposing a sum constraint on two coin flips, let's say you want the sum of coin flips in the system to equal exactly 1. Since the outcome of the second coin flip is determined by the outcome of the first binary decision, the randomness in the system shrinks.
Therefore you reduce the total randomness of the system from 2 bit to 1 bit.
Sampling 8 truly (pseudo)-random points from a normal distribution with a sum-constraint is therefore not possible.
Your best bet would be to sample 7 random points from a distribution with appropriate mean and then add a point to the dataset to absorb the difference:
>>> import numpy as np
>>> temp = 100.0
>>> datapoints = 8
>>> dev = 1
>>> data = np.random.normal(temp/datapoints, dev, datapoints-1)
>>> print(data)
[ 11.70369328 10.77010243 11.20507387 12.40637644 12.81099137
12.55329521 10.95809056]
>>> data = np.append(data,temp-sum(data))
>>> data
array([ 11.70369328, 10.77010243, 11.20507387, 12.40637644,
12.81099137, 12.55329521, 10.95809056, 17.59237685])
>>> sum(data)
100.0
Upvotes: 1