Reputation: 2103
Could you help me compute N random pairs vectors x and y with size d and such that x_i and y_i belong to the real range U[-1,1] and such that the euclidean distance between x and y could be less or equal than a small T? I need to compute these N pairs efficiently (in terms of time) in Python.
I tried
import numpy as np
d = 4
inv_d = 1.0 / d
random_lst = []
T = 0.05
N = 10
for i in range(N):
while(1):
x = np.random.uniform(low=-1, high=1, size=d)
y = np.random.uniform(low=-1, high=1, size=d)
length = np.linalg.norm(x-y)
if length <= T:
random_lst.append([x,y])
print(length)
break
print(random_lst)
But it took a lot of time (40s), and I need N near from 10^6, that is maybe it could be take even more time
Upvotes: 2
Views: 307
Reputation: 82899
(Building upon the idea by constt, but I think there are two problems with the approach: The split
function is not really uniform, and the distance is not quite right.)
The idea is: The distance between two points is sqrt(sum(i) (xi-yi)**2)
, so to get a random point at exactly that distance we can split
the square of the distance into random parts, then get the sqrt
of those.
First, to get a uniform split into k
parts, we can generate k-1
uniformly distributed numbers, and then get the pairwise difference of those (when sorted) and the two boundaries:
def split(val, k):
nums = np.random.uniform(low=0, high=val, size=k-1)
return np.diff([0] + sorted(nums) + [val])
for _ in range(10):
sp = split(100, 10)
print([round(x, 1) for x in sp], sum(sp), len(sp))
Then to get the pairs of points, we generate one random point, then apply the difference by splitting the square of the difference and getting the sqrt of the parts:
def points_pair(dim, max_dist):
X = np.random.uniform(low=-1, high=1, size=dim)
diff = [x**.5 for x in split(max_dist**2, dim)]
return X, X + diff
for _ in range(10):
x, y = points_pair(512, 0.1)
print(np.linalg.norm(x - y))
This will only generate points at exactly the max_dist
and only in the "upper-right" quadrant (or whatever is it's equivalent in 512 dimensions). To get a random second point in any quadrant and up to that distance, also randomize the distance and the signs of the components:
dist = np.random.uniform(0, max_dist)
diff = [x**.5 * random.choice([+1, -1]) for x in split(dist**2, dim)]
(You might also have to check and re-roll the diff if X + diff
is outside the bounds of [-1, +1]
.)
Upvotes: 0
Reputation: 2894
You could just make y
dependent on x
. If it has to lie less than T
away from x
, just create a random vector with the absolute length smaller than T
and add it to x. So, you get a pair of vectors close to each other.
Since the distance in no dimension may be larger than T
, the space in which this vector may lie is bounded by a d-dimensional cube with size T
. That's a way smaller space than the original [-1,1] space.
x = np.random.uniform(low=-1, high=1, size=d)
dxy = np.random.uniform(low=-T, high=T, size=d) ## Maybe, you are lucky in the first go
while np.linalg.norm(dxy)>T:
dxy = np.random.uniform(low=-T, high=T, size=d) ## Repeat until lucky.
y = np.add(x,dxy)
EDIT:
If you need it much faster, you should not randomly pick from a box to find a point within a given range. Just rescale the deviation to a random value smaller than T
, regardless of its original length:
x = np.random.uniform(low=-1, high=1, size=d)
dxy = np.random.uniform(low=-1, high=1, size=d) ## will be too large in most cases
scale = T/np.linalg.norm(dxy) ## Factor by which it has to be multiplied to be exactly T
dxy = np.multiply(dxy,random.random()*scale) ## Random value between 0 and scale
y = np.add(x,dxy)
So, you only have to randomly pick one second vector and only have to compute one length per pair. That should speed things up, as these are the time limiting operations.
Upvotes: 1
Reputation: 2320
Here is a quick and straightforward approximate solution to the problem. The resulting vectors will be always less and close enough but not equal to the given length. First, let's implement a helper function that is used to split any natural number into a set of positive random numbers which add up to the given input number.
def split(num, buckets=1, result=None):
if result is None:
result = []
if buckets < 2:
result.append(num)
np.random.shuffle(result)
return result
bucket = np.random.uniform(low=0, high=num) if num > 0 else 0
result.append(bucket)
return split(num - bucket, buckets - 1, result)
Here is how it works
>>> xs = split(10, buckets=3)
>>> xs
[7.60495737395197, 0.6968567573189194, 1.698185868729111]
>>> sum(xs)
10.0
Now, let's make a function that returns a pair of points from the Euclidean space given the coordinate bounds, number of dimensions, and distance.
def point_pair(low=-1, high=1, dim=2, dist=0.05):
x = np.random.uniform(low=-1, high=1, size=dim)
diff = split(dist, dim)
y = np.clip(x + diff, low, high)
return x, y
This is how it works:
>>> point_pair(dim=4, dist=0.05)
(array([ 0.18856095, -0.02635086, -0.59616698, -0.51447733]),
array([ 0.20485765, -0.01791704, -0.59026813, -0.49510669]))
Finally, let's test it out:
>>> pairs = []
>>> for _ in range(10):
pairs.append(point_pair(dim=4, dist=0.05))
>>> all(np.linalg.norm(x - y) < 0.05 for x, y in pairs)
True
Here is a simple run test on a fairly slow machine Intel(R) Core(TM) m5-6Y54 CPU @ 1.10GHz:
>>> %timeit point_pair(dim=4, dist=0.05)
38.6 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 1
Reputation: 58
One improvement I can think of is to pick y from the circle around x with radius T (instead of randomly picking it everywhere and checking if that's in the circle). This way you don't need the inner loop and it may reduce the run time significantly.
Upvotes: 0