Reputation: 1513
In order to do a monte carlo simulation to estimate expected distance between two random points in $n$ dimensional space I discovered the following two similar looking methods to generate random points seem to differ. I'm not able to figure out why.
Method 1:
def expec_distance1(n, N = 10000):
u = uniform(0,1)
dist = 0
for i in range(N):
x = np.array([u.rvs() for i in range(n)])
y = np.array([u.rvs() for i in range(n)])
dist = (dist*i + euclidean_dist(x,y))/(i+1.0)
return dist
Method 2:
def expec_distance2(n, N = 10000):
u = uniform(0,1)
dist = 0
for i in range(N):
x = u.rvs(n)
y = u.rvs(n)
dist = (dist*i + euclidean_dist(x,y))/(i+1.0)
return dist
where uniform distribution is scipy.stats.uniform
and np
stands for numpy.
For 100 runs of the two methods (for n = 2), with method 1, I get $\mu = 0.53810011995126483, \sigma = 0.13064091613389378$ with method 2, $\mu = 0.52155615672453093, \sigma = 0.0023768774304696902$
Why is there such a big difference between std dev of two methods?
Here is the code to try: https://gist.github.com/swairshah/227f056e6acee07db6778c3ae746685b (I've replaced scipy with numpy, cause its faster but it has the same difference between std dev)
Upvotes: 1
Views: 59
Reputation: 76773
In Python 2, list comprehensions leak their loop variables.
Since you're looping over i
in your list comprehensions ([u.rvs() for i in range(n)]
), that is the i
used in dist = (dist*i + euclidean_dist(x,y))/(i+1.0)
. (i
always equals n-1
rather than the value of the main loop variable.)
Upvotes: 2