Reputation: 69
I am trying to generate a synthetic earthquake database where the number of events ($N$) with magnitude ($M$) in the range $[M, M+\delta_M]$ follows:
$\log_{10}(N) = a - bM$
where $a$ and $b$ are constants.
I am trying to do this in Python using the random
module. I know I can (or at least I think I can - as I haven't tried it) use random.expovariate
but I thought I could use random.random
with a transformation like:
-math.log10(random.random()))
I ran this for 2,000,000 samples which I then binned into 0.1 bins and plotted on a log scale.
The red line shows the theoretical distribution used to generate the synthetic samples.
I'm not worried about the variation above x=4.5. This is due to small number of points and natural randomness. What I am asking about is the very small (at this scale) variation for the points near x=0. I plotted the variation of the synthetic points from the theoretical (blue dots):
As x decreases the number of events increase exponentially so the variation from the theoretical should decrease - not increase. And the point at x=0 is the opposite sense.
To try and work out where my problem lies I wrote code that generated numbers from 0 to 1 with a very fine step. Each number then went through the function noted above. The result (the blue dots in the above figure) is purely linear that exactly matches the theoretical values. This indicates that my transformation function and code is fine.
So the only difference between the twp sets of points in the above figure is that the blue ones are generated by 2,000,000 calls to the random function (results are then transformed into magnitudes and binned), while for the red ones I've taken 2,000,000 uniform steps between 0 and 1 (results are then transformed into magnitudes and binned using the same code).
So I'm thinking it's somehow something to do with the random number generator?
Would be grateful for any pointers. Thanks.
[added]
Changed the call from random.random to random.uniform(0,1)
as suggested by @Arty and the errors are now symmetrically distributed and of the expected magnitude. Have added +- 1 standard deviation to the plot.
Clearly random()
and uniform(0,1)
are doing something slightly differently.
I cut down my code and calculated synthetic data using random.random
, random.uniform(0,1)
, np.random.random
and np.random.uniform(0, 1)
for 2,000,000 points.
Binned the results and plotted the difference between the observed and expected numbers (below).
Also added in the +-1 standard deviation limits. The numbers are all symmetrically distributed and of the correct magnitude indicating that ALL the random generators are working fine.
My conclusion is that somewhere along the line of changing/refining code I introduced a problem which has now been lost. I would dearly like to find that error so I don't make it again!
I am surprised that my original, incorrect code could perform correctly to the extent that it generated a real looking synthetic with only minor anomalies that were difficult to detect.
Thanks for everyone's help and apologies to those I disagreed with that said the problem did not lie with the random number generators!
Upvotes: 3
Views: 581
Reputation: 4273
Initially I thought you might be having some numerical analysis problem. Trying a million samples in python, however, I get the following observed results:
>>> T = int(1e6)
>>> xs = [ -math.log10(random.random()) for i in range(T)]
>>> len([x for x in xs if 0 <= x < 0.1])
205614
>>> len([x for x in xs if 0.1 <= x < 0.2])
163736
>>> len([x for x in xs if 0.2 <= x < 0.3])
129627
>>> len([x for x in xs if 0.3 <= x < 0.4])
103413
>>> len([x for x in xs if 0.4 <= x < 0.5])
81734
If X = -log_10(x) with x uniformly distributed on [0, 1), then we should have
P(M <= X < M + d) = P(-M-d < log_10(x) <= -M) = 10^(-M) - 10^(-M-d)
and the numbers above are basically perfectly in line with these probabilities, e.g.
1 - 10^(-0.1) = 0.205672
which matches up nicely with our observed 205614 out of a million trials above.
Do you get different results than I do for the python code above?
Upvotes: 2