lazy
lazy

Reputation: 77

Slowness in Data::Random

Hi i'm using Data::Random module to generate Random Dates but its very slow for generating sample data of 1 Million. How to increase the speed of it ? Here is the code i have tried with.

#!/usr/bin/perl -w

use Data::Random qw(:all);

my $randDate_Start = '1900-01-01';
my $randDate_End = '2010-12-31';

open Outfile, ">", "D:/Test.txt";

for(0..1000000)
{
     my $randDate = rand_date( min=>$randDate_Start, max=>$randDate_End);
     print Outfile $randDate."\n";
}

close Outfile;

is there any other way to generate Random Dates

Upvotes: 1

Views: 76

Answers (3)

Glenn
Glenn

Reputation: 1167

I would start by unrolling the loop. You may not be able to unroll it a million times, but you can probably unroll large number of times and loop a lot less. That will help to speed it up as it does not have to branch back for the next item. I did a short test and it was about a 5 to 10 fold speedup. Here is what I would propose for the loop of 1 million (if I have my math correct :))

# Declare the variable before the loop
my $randDate;
# Statement is what we want to execute a number of times
my $statement = "$randDate = rand_date( min=>$randDate_Start, max=>$randDate_End);print Outfile \$randDate.\"\\n\";"
# Replicate the statement 1000 times 
$statement = $statement x 1000;
# Get the time we started (to the second)
my $start = time();
# Loop 1000 times to make a 1 million items
for(0..1000)
{
  # Evaluate the 1000 statements
  eval($statement);
}
# Determine the amount of time it took
my $diff = time() - $start;
# Print out the time
print "Time taken is: $diff\n";

When I did this, it took 107 seconds if I looped a million times, and 28 seconds if I used the above method to generate 1 millions items.

If that is not enough speed, then you may have to make a routine to generate your dates. Given the range, there would be 111 year and at 365.25 days per year, that would be a range of 40543 dates to make. That could be generated once at the start. You could make a array with each date for the time frame. Then using rand you can generate a number between 0 and 40543. That would give you and index into the array for the date to select. This is a bit more work than the above if above does provide sufficient speedup.

Upvotes: 0

Miller
Miller

Reputation: 35198

I suggest using Time::Piece.

It shows a 6 fold increase in performance as demonstrated by the below benchmarks.

And if you cache the possible date values, you can get a pretty much instantaneous result of all 1 million values:

#!/usr/bin/perl -w
use strict;
use warnings;
use autodie;

use Benchmark;
use Data::Random qw(:all);
use Time::Piece;
use Time::Seconds;

my $randDate_Start = '1900-01-01';
my $randDate_End   = '2010-12-31';

my $tp_start = Time::Piece->strptime( "$randDate_Start 12:00:00", "%Y-%m-%d %T" );
my $tp_end   = Time::Piece->strptime( "$randDate_End 12:00:00",   "%Y-%m-%d %T" );
my $tp_days  = ( $tp_end - $tp_start )->days;

my @tp_cached = map { ( $tp_start + ONE_DAY * $_ )->strftime('%Y-%m-%d') } ( 0 .. $tp_days );

# Compare Data Methods
timethese(
    1_000_000,
    {   'Data::Random'         => sub { rand_date( min => $randDate_Start, max => $randDate_End ) },
        'Time::Piece'          => sub { ( $tp_start + ONE_DAY * int rand $tp_days )->strftime('%Y-%m-%d') },
        'Time::Piece (cached)' => sub { $tp_cached[ rand $tp_days ] },
    }
);

Outputs:

Benchmark: timing 1000000 iterations of Data::Random, Time::Piece, Time::Piece (cached)...
Data::Random: 61 wallclock secs (60.20 usr +  0.07 sys = 60.27 CPU) @ 16592.00/s (n=1000000)
Time::Piece: 10 wallclock secs ( 9.95 usr +  0.01 sys =  9.96 CPU) @ 100401.61/s (n=1000000)
Time::Piece (cached):  0 wallclock secs ( 0.08 usr +  0.00 sys =  0.08 CPU) @ 12500000.00/s (n=1000000)
            (warning: too few iterations for a reliable count)

Upvotes: 2

clt60
clt60

Reputation: 63892

Using the second technique what @Glenn recommends, without any optimisation

use 5.010;
use strict;
use warnings;
use Date::Calc qw(Delta_Days Add_Delta_Days);

#create an array for each day
my $numdays = Delta_Days(1900,1,1, 2010,12,31) + 1;
my @dates = map { sprintf("%d-%02d-%02d", Add_Delta_Days(1900,1,1, $_)) } 0..$numdays; 

say $dates[ rand($numdays) ] for(1..100_000_000);

running

$ time perl dat | wc -l
 100000000

real    0m32.227s
user    0m31.439s
sys     0m1.159s

for 100_000_000. For 1 milion is 1.2 seconds...

Upvotes: 1

Related Questions