Difference between two methods of random point generation

Question

In order to do a monte carlo simulation to estimate expected distance between two random points in $n$ dimensional space I discovered the following two similar looking methods to generate random points seem to differ. I'm not able to figure out why.

Method 1:

def expec_distance1(n, N = 10000):
  u = uniform(0,1)
  dist = 0
  for i in range(N):

      x = np.array([u.rvs() for i in range(n)])
      y = np.array([u.rvs() for i in range(n)])

      dist = (dist*i + euclidean_dist(x,y))/(i+1.0)
  return dist

Method 2:

def expec_distance2(n, N = 10000):

  u = uniform(0,1)
  dist = 0
  for i in range(N):

      x = u.rvs(n)
      y = u.rvs(n)

      dist = (dist*i + euclidean_dist(x,y))/(i+1.0)
  return dist

where uniform distribution is scipy.stats.uniform and np stands for numpy.

For 100 runs of the two methods (for n = 2), with method 1, I get $\mu = 0.53810011995126483, \sigma = 0.13064091613389378$ with method 2, $\mu = 0.52155615672453093, \sigma = 0.0023768774304696902$

Why is there such a big difference between std dev of two methods?

Here is the code to try: https://gist.github.com/swairshah/227f056e6acee07db6778c3ae746685b (I've replaced scipy with numpy, cause its faster but it has the same difference between std dev)

Mike Graham · Accepted Answer

In Python 2, list comprehensions leak their loop variables.

Since you're looping over i in your list comprehensions ([u.rvs() for i in range(n)]), that is the i used in dist = (dist*i + euclidean_dist(x,y))/(i+1.0). (i always equals n-1 rather than the value of the main loop variable.)

Difference between two methods of random point generation

Answers (1)

Related Questions