Reputation: 83695

Generate 4000 unique pseudo-random cartesian coordinates FASTER?

The range for x and y is from 0 to 99.

I am currently doing it like this:

excludeFromTrainingSet = []
while len(excludeFromTrainingSet) < 4000:
    tempX = random.randint(0, 99)
    tempY = random.randint(0, 99)
    if [tempX, tempY] not in excludeFromTrainingSet:
        excludeFromTrainingSet.append([tempX, tempY])

But it takes ages and I really need to speed this up.

Any ideas?

Upvotes: 2

Answers (8)

PaulMcG

Reputation: 63739

Make a list of all possible (x,y) values:

allpairs = list((x,y) for x in xrange(99) for y in xrange(99))

# or with Py2.6 or later:
from itertools import product
allpairs = list(product(xrange(99),xrange(99)))

# or even taking DRY to the extreme
allpairs = list(product(*[xrange(99)]*2))

Shuffle the list:

from random import shuffle
shuffle(allpairs)

Read off the first 'n' values:

n = 4000
trainingset = allpairs[:n]

This runs pretty snappily on my laptop.

Upvotes: 4

Mike Axiak

Reputation: 12004

I'm sure someone is going to come in here with a usage of numpy, but how about using a set and tuple? E.g.:

excludeFromTrainingSet = set()
while len(excludeFromTrainingSet) < 40000:
    temp = (random.randint(0, 99), random.randint(0, 99))
    if temp not in excludeFromTrainingSet:
        excludeFromTrainingSet.add(temp)

EDIT: Isn't this an infinite loop since there are only 100^2 = 10000 POSSIBLE results, and you're waiting until you get 40000?

Upvotes: 4

aaronasterling

Reputation: 71014

Vincent Savard has an answer that's almost twice as fast as the first solution offered here.

Here's my take on it. It requires tuples instead of lists for hashability:

def method2(size):
    ret = set()
    while len(ret) < size:
        ret.add((random.randint(0, 99), random.randint(0, 99)))
    return ret

Just make sure that the limit is sane as other answerers have pointed out. For sane input, this is better algorithmically O(n) as opposed to O(n^2) because of the set instead of list. Also, python is much more efficient about loading locals than globals so always put this stuff in a function.

EDIT: Actually, I'm not sure that they're O(n) and O(n^2) respectively because of the probabilistic component but the estimations are correct if n is taken as the number of unique elements that they see. They'll both be slower as they approach the total number of available spaces. If you want an amount of points which approaches the total number available, then you might be better off using:

import random
import itertools

def method2(size, min_, max_):
    range_ = range(min_, max_)
    points = itertools.product(range_, range_)
    return random.sample(list(points), size)

This will be a memory hog but is sure to be faster as the density of points increases because it avoids looking at the same point more than once. Another option worth profiling (probably better than last one) would be

def method3(size, min_, max_):
    range_ = range(min_, max_)
    points = list(itertools.product(range_, range_))

    N = (max_ - min_)**2
    L =  N - size
    i = 1
    while i <= L:
        del points[random.randint(0, N - i)]
        i += 1
    return points

Upvotes: 6

user506432

Reputation: 1

## for py 3.0+
## generate 4000 points in 2D
##
import random 
maxn = 10000
goodguys = 0
excluded = [0 for excl in range(0, maxn)]
for ntimes in range(0, maxn):
  alea = random.randint(0, maxn - 1)
  excluded[alea] += 1
  if(excluded[alea] > 1): continue 
  goodguys += 1
  if goodguys > 4000: break
  two_num = divmod(alea, 100)  ## Unfold the 2 numbers
  print(two_num)

Upvotes: 0

hughdbrown

Reputation: 49013

Taking Vince Savard's code:

>>> from random import choice
>>> def method2(size):
...     randints = range(0, 100)
...     excludeFromTrainingSet = set()
...     while True:
...         x = size - len(excludeFromTrainingSet)
...         if not x:
...             break
...         else:
...             excludeFromTrainingSet.add((choice(randints), choice(randints)) for _ in range(x))
...     return excludeFromTrainingSet
... 
>>> s = method2(4000)
>>> len(s)
4000

This is not a great algorithm because it has to deal with collisions, but the tuple-generation makes it tolerable. This runs in about a second on my laptop.

Upvotes: 0

Vincent Savard

Reputation: 35927

My suggestion :

def method2(size):
    randints = range(0, 100)
    excludeFromTrainingSet = set()

    while len(excludeFromTrainingSet) < size:
        excludeFromTrainingSet.add((random.choice(randints), random.choice(randints)))
    return excludeFromTrainingSet

Instead of generation 2 random numbers every time, you first generate the list of numbers from 0 to 99, then you choose 2 and appends to the list. As others pointed out, there are only 10 000 possibilities so you can't loop until you get 40 000, but you get the point.

Upvotes: 4

user395760

Reputation:

Generating 40 thousand numbers inevitably will take a while. But you are performing an O(n) linear search on the excludeFromTrainingSet, which takes quite a while especially later in the process. Use a set instead. You could also consider generating a number of coordinate sets e.g. over night and pickle them, so you don't have to generate new data for each test run (dunno what you're doing, so this might or might not help). Using tuples, as someone noted, is not only the semantically correct choice, it might also help with performance (tuple creation is faster than list creation). Edit: Silly me, using tuples is required when using sets, since set members must be hashable and lists are unhashable.

But in your case, your loop isn't terminating because 0..99 is 100 numbers and two-tuples of them have only 100^2 = 10000 unique combinations. Fix that, then apply the above.

Upvotes: 1

oldSkool

Reputation: 1232

You could make a lookup table of random values... make a random index into that lookup table, and then step through it with a static increment counter...

Upvotes: 1

Generate 4000 unique pseudo-random cartesian coordinates FASTER?

Answers (8)

Related Questions