Juan
Juan

Reputation: 2103

Compute random values inside a euclidean distance efficiently (in terms of time) Python

Could you help me compute N random pairs vectors x and y with size d and such that x_i and y_i belong to the real range U[-1,1] and such that the euclidean distance between x and y could be less or equal than a small T? I need to compute these N pairs efficiently (in terms of time) in Python.

I tried

 import numpy as np
 d = 4
 inv_d = 1.0 / d
 random_lst = []
 T = 0.05
 N = 10
 for i in range(N):
     while(1):
         x = np.random.uniform(low=-1, high=1, size=d)
         y = np.random.uniform(low=-1, high=1, size=d)
         length = np.linalg.norm(x-y)
         if length <= T:
             random_lst.append([x,y])
             print(length)
             break
 print(random_lst)

But it took a lot of time (40s), and I need N near from 10^6, that is maybe it could be take even more time

Upvotes: 2

Views: 307

Answers (4)

tobias_k
tobias_k

Reputation: 82899

(Building upon the idea by constt, but I think there are two problems with the approach: The split function is not really uniform, and the distance is not quite right.)

The idea is: The distance between two points is sqrt(sum(i) (xi-yi)**2), so to get a random point at exactly that distance we can split the square of the distance into random parts, then get the sqrt of those.

First, to get a uniform split into k parts, we can generate k-1 uniformly distributed numbers, and then get the pairwise difference of those (when sorted) and the two boundaries:

def split(val, k):
    nums = np.random.uniform(low=0, high=val, size=k-1)
    return np.diff([0] + sorted(nums) + [val])

for _ in range(10):
    sp = split(100, 10)
    print([round(x, 1) for x in sp], sum(sp), len(sp))

Then to get the pairs of points, we generate one random point, then apply the difference by splitting the square of the difference and getting the sqrt of the parts:

def points_pair(dim, max_dist):
    X = np.random.uniform(low=-1, high=1, size=dim)
    diff = [x**.5 for x in split(max_dist**2, dim)]
    return X, X + diff

for _ in range(10):
    x, y = points_pair(512, 0.1)
    print(np.linalg.norm(x - y))

This will only generate points at exactly the max_dist and only in the "upper-right" quadrant (or whatever is it's equivalent in 512 dimensions). To get a random second point in any quadrant and up to that distance, also randomize the distance and the signs of the components:

    dist = np.random.uniform(0, max_dist)
    diff = [x**.5 * random.choice([+1, -1]) for x in split(dist**2, dim)]

(You might also have to check and re-roll the diff if X + diff is outside the bounds of [-1, +1].)

Upvotes: 0

Martin Wettstein
Martin Wettstein

Reputation: 2894

You could just make y dependent on x. If it has to lie less than T away from x, just create a random vector with the absolute length smaller than T and add it to x. So, you get a pair of vectors close to each other.

Since the distance in no dimension may be larger than T, the space in which this vector may lie is bounded by a d-dimensional cube with size T. That's a way smaller space than the original [-1,1] space.

x = np.random.uniform(low=-1, high=1, size=d)
dxy = np.random.uniform(low=-T, high=T, size=d) ## Maybe, you are lucky in the first go
while np.linalg.norm(dxy)>T:
    dxy = np.random.uniform(low=-T, high=T, size=d) ## Repeat until lucky.
y = np.add(x,dxy)

EDIT:

If you need it much faster, you should not randomly pick from a box to find a point within a given range. Just rescale the deviation to a random value smaller than T, regardless of its original length:

x = np.random.uniform(low=-1, high=1, size=d)
dxy = np.random.uniform(low=-1, high=1, size=d) ## will be too large in most cases
scale = T/np.linalg.norm(dxy) ## Factor by which it has to be multiplied to be exactly T
dxy = np.multiply(dxy,random.random()*scale) ## Random value between 0 and scale
y = np.add(x,dxy)

So, you only have to randomly pick one second vector and only have to compute one length per pair. That should speed things up, as these are the time limiting operations.

Upvotes: 1

constt
constt

Reputation: 2320

Here is a quick and straightforward approximate solution to the problem. The resulting vectors will be always less and close enough but not equal to the given length. First, let's implement a helper function that is used to split any natural number into a set of positive random numbers which add up to the given input number.

def split(num, buckets=1, result=None):

    if result is None:
        result = []

    if buckets < 2:
        result.append(num)
        np.random.shuffle(result)
        return result

    bucket = np.random.uniform(low=0, high=num) if num > 0 else 0
    result.append(bucket)

    return split(num - bucket, buckets - 1, result)

Here is how it works

>>> xs = split(10, buckets=3)
>>> xs
[7.60495737395197, 0.6968567573189194, 1.698185868729111]
>>> sum(xs)
10.0

Now, let's make a function that returns a pair of points from the Euclidean space given the coordinate bounds, number of dimensions, and distance.

def point_pair(low=-1, high=1, dim=2, dist=0.05):
    x = np.random.uniform(low=-1, high=1, size=dim)
    diff = split(dist, dim)
    y = np.clip(x + diff, low, high)
    return x, y

This is how it works:

>>> point_pair(dim=4, dist=0.05)
(array([ 0.18856095, -0.02635086, -0.59616698, -0.51447733]),
 array([ 0.20485765, -0.01791704, -0.59026813, -0.49510669]))

Finally, let's test it out:

>>> pairs = []
>>> for _ in range(10):
        pairs.append(point_pair(dim=4, dist=0.05))
>>> all(np.linalg.norm(x - y) < 0.05 for x, y in pairs)
True

Here is a simple run test on a fairly slow machine Intel(R) Core(TM) m5-6Y54 CPU @ 1.10GHz:

>>> %timeit point_pair(dim=4, dist=0.05)
38.6 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 1

eagr
eagr

Reputation: 58

One improvement I can think of is to pick y from the circle around x with radius T (instead of randomly picking it everywhere and checking if that's in the circle). This way you don't need the inner loop and it may reduce the run time significantly.

Upvotes: 0

Related Questions