Reputation: 480
The sort of problem I am dealing with involves a few things, namely:
splice
are not going to work.I start by picking a number between 1 and 1,000,000,000.
my $random = int(rand(1_000_000_000)) + 1;
I add a value, say 100, to that to make $random
and $random + 100
define an interval.
my $interval = $random + 100;
Then I push
both $random
and $interval
into another array. This other array is to store the intervals.
push ( @rememberOldIntervals, $random, $interval );
I step through array @rememberOldIntervals
using a for
loop, pulling out items in pairs. The first of a pair is a former $random
and the other a $interval
. Inside this for
loop, I do another random number generation. But the number generated can't be between an interval already taken. If so, keep sampling until a number is found that is unique. Further, this new random number must be at least 100 away from any old interval.
for ( my $i= 0; $i < (scalar @rememberOldIntervals) / 2 ; $i=+2) {
$random = int(rand(1_000_000_000)) + 1;
my $new_random_low = $random - 100;
my $new_random_high = $random + 100;
if ( $new_random_low <= $rememberOldIntervals[0] OR
$new_random_high >= $rememberOldIntervals[1] ){
push( @rememberOldIntervals, $new_random_low, $new_random_high );
}
else {
until ($new_random_low <= $rememberOldIntervals[0] OR
$new_random_high >= $rememberOldIntervals[1] ) {
$random = int(rand(1_000_000_000)) + 1;
my $new_random_low = $random - 100;
my $new_random_high = $random + 100;
}
}
}
This latter loop would need to be embedded within another to drive it many times, say 10,000 times.
Upvotes: 1
Views: 145
Reputation: 35208
This problem can be reframed into pulling 10,000 random numbers between 0 and 1 billion, where no number is within 100 of another.
Brute Force - 5 secs
Because you're only pulling 10,000 numbers, and probably don't need to do it very often, I suggest approaching this type of problem using brute force initially. This is trying to follow the design pattern of Premature optimization is the root of all evil
In this case, that means just pulling random numbers and comparing them to all previously pulled numbers. This will have a speed of O(N^2)
, but will also take less code.
use strict;
use warnings;
my $max = 1_000_000_000;
my $dist = 100;
my $count = 10_000;
die "Too many numbers" if 2 * $dist * $count >= $max;
my @numbers;
while (@numbers < $count) {
my $num = int rand $max;
push @numbers, $num if ! grep {abs($num - $_) < $dist} @numbers;
}
print scalar(@numbers), "\n";
Output takes 5 seconds:
10000
Binary Search for faster generation - 0.14 secs
Now for faster algorithm, I agree with ysth
that a much more efficient method to solve this is to create two lists of your random numbers. One of them is the running list, and the other is sorted. Use the sorted list to do a binary search for placement and then comparison to its nearby elements to see if it is within 100.
This reduces the number of comparisons from O(N^2)
to O(N log N)
. The following takes just 0.14 seconds to run versus the 5 seconds of the brute force method.
use strict;
use warnings;
my $max = 1_000_000_000;
my $dist = 100;
my $count = 10_000;
die "Too many numbers" if 2 * $dist * $count >= $max;
my @numbers;
my @sorted = (-$dist, $max); # Include edges to simplify binary search logic.
while (@numbers < $count) {
my $num = int rand $max;
# Binary Search of Sorted list.
my $binary_min = 0;
my $binary_max = $#sorted;
while ($binary_max > $binary_min) {
my $average = int( ($binary_max + $binary_min) / 2 );
$binary_max = $average if $sorted[$average] >= $num;
$binary_min = $average + 1 if $sorted[$average] <= $num;
}
if (! grep {abs($num - $_) < $dist} @sorted[$binary_max, $binary_max - 1]) {
splice @sorted, $binary_max, 0, $num;
push @numbers, $num;
}
}
print scalar(@numbers), "\n";
Hash of quotients for fastest - 0.05 secs
I inquired in the comments: "Could you simplify this problem to pick a random multiple of 100? That would ensure no overlap, and then you'd just need to pick a random number from 1 to 10 million without repeat, and then just multiply it by 100." You didn't respond, but we can still use grouping by multiples of 100 to simplify this problem.
Basically, if we keep track of a number's quotient divided by 100, we only need it to compare it to numbers with quotients plus and minus one. This reduces the number of comparisons to O(N)
, which not surprisingly is the fastest at 0.05 seconds:
use strict;
use warnings;
my $max = 1_000_000_000;
my $dist = 100;
my $count = 10_000;
die "Too many numbers" if 2 * $dist * $count >= $max;
my @numbers;
my %num_per_quot;
while (@numbers < $count) {
my $num = int rand $max;
my $quotient = int $num / $dist;
if (! grep {defined && abs($num - $_) < $dist} map {$num_per_quot{$quotient + $_}} (-1, 0, 1)) {
push @numbers, $num;
$num_per_quot{$quotient} = $num;
}
}
print scalar(@numbers), "\n";
Caution if you're on Windows
If you run this code on Windows and are using a version of perl less than v5.20, you'll need to use a better random number generate than the built-in rand
. For reasons why, read avoid using rand if it matters
.
I used Math::Random::MT qw(rand);
in this code since I'm on Strawberry Perl v5.18.2. However, starting with Perl v5.20 this will no longer be a concern because rand now uses a consistent random number generator
.
Upvotes: 1
Reputation: 525
You can speed it up by using hashes and indices.
This will part the space into indexed segments of width 200, and each interval will be placed randomly in a random segment.
my $interval = 100;
my $space = 1e9;
my $interval_count = 1e4;
my @values;
my %index_taken;
for(1..$interval_count)
{
my $index;
$index while $index_taken{$index = int rand $space/2/$interval }++;
my $start = $index*2*$interval + 1 + int rand $interval;
push @values, $start, $start+$interval;
}
It guarantees nonoverlapping intervals but there will be inaccessible space of up to 200 between two intervals.
Or, if you want the intervals sorted:
@values = map {$_*=2*$interval; $_+=1+int rand $interval; ($_,$_+$interval)}
sort keys %index_taken;
Upvotes: 1