Reputation: 77
Hi i'm using Data::Random
module to generate Random Dates but its very slow for generating sample data of 1 Million. How to increase the speed of it ? Here is the code i have tried with.
#!/usr/bin/perl -w
use Data::Random qw(:all);
my $randDate_Start = '1900-01-01';
my $randDate_End = '2010-12-31';
open Outfile, ">", "D:/Test.txt";
for(0..1000000)
{
my $randDate = rand_date( min=>$randDate_Start, max=>$randDate_End);
print Outfile $randDate."\n";
}
close Outfile;
is there any other way to generate Random Dates
Upvotes: 1
Views: 76
Reputation: 1167
I would start by unrolling the loop. You may not be able to unroll it a million times, but you can probably unroll large number of times and loop a lot less. That will help to speed it up as it does not have to branch back for the next item. I did a short test and it was about a 5 to 10 fold speedup. Here is what I would propose for the loop of 1 million (if I have my math correct :))
# Declare the variable before the loop
my $randDate;
# Statement is what we want to execute a number of times
my $statement = "$randDate = rand_date( min=>$randDate_Start, max=>$randDate_End);print Outfile \$randDate.\"\\n\";"
# Replicate the statement 1000 times
$statement = $statement x 1000;
# Get the time we started (to the second)
my $start = time();
# Loop 1000 times to make a 1 million items
for(0..1000)
{
# Evaluate the 1000 statements
eval($statement);
}
# Determine the amount of time it took
my $diff = time() - $start;
# Print out the time
print "Time taken is: $diff\n";
When I did this, it took 107 seconds if I looped a million times, and 28 seconds if I used the above method to generate 1 millions items.
If that is not enough speed, then you may have to make a routine to generate your dates. Given the range, there would be 111 year and at 365.25 days per year, that would be a range of 40543 dates to make. That could be generated once at the start. You could make a array with each date for the time frame. Then using rand you can generate a number between 0 and 40543. That would give you and index into the array for the date to select. This is a bit more work than the above if above does provide sufficient speedup.
Upvotes: 0
Reputation: 35198
I suggest using Time::Piece
.
It shows a 6 fold increase in performance as demonstrated by the below benchmarks.
And if you cache the possible date values, you can get a pretty much instantaneous result of all 1 million values:
#!/usr/bin/perl -w
use strict;
use warnings;
use autodie;
use Benchmark;
use Data::Random qw(:all);
use Time::Piece;
use Time::Seconds;
my $randDate_Start = '1900-01-01';
my $randDate_End = '2010-12-31';
my $tp_start = Time::Piece->strptime( "$randDate_Start 12:00:00", "%Y-%m-%d %T" );
my $tp_end = Time::Piece->strptime( "$randDate_End 12:00:00", "%Y-%m-%d %T" );
my $tp_days = ( $tp_end - $tp_start )->days;
my @tp_cached = map { ( $tp_start + ONE_DAY * $_ )->strftime('%Y-%m-%d') } ( 0 .. $tp_days );
# Compare Data Methods
timethese(
1_000_000,
{ 'Data::Random' => sub { rand_date( min => $randDate_Start, max => $randDate_End ) },
'Time::Piece' => sub { ( $tp_start + ONE_DAY * int rand $tp_days )->strftime('%Y-%m-%d') },
'Time::Piece (cached)' => sub { $tp_cached[ rand $tp_days ] },
}
);
Outputs:
Benchmark: timing 1000000 iterations of Data::Random, Time::Piece, Time::Piece (cached)...
Data::Random: 61 wallclock secs (60.20 usr + 0.07 sys = 60.27 CPU) @ 16592.00/s (n=1000000)
Time::Piece: 10 wallclock secs ( 9.95 usr + 0.01 sys = 9.96 CPU) @ 100401.61/s (n=1000000)
Time::Piece (cached): 0 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU) @ 12500000.00/s (n=1000000)
(warning: too few iterations for a reliable count)
Upvotes: 2
Reputation: 63892
Using the second technique what @Glenn recommends, without any optimisation
use 5.010;
use strict;
use warnings;
use Date::Calc qw(Delta_Days Add_Delta_Days);
#create an array for each day
my $numdays = Delta_Days(1900,1,1, 2010,12,31) + 1;
my @dates = map { sprintf("%d-%02d-%02d", Add_Delta_Days(1900,1,1, $_)) } 0..$numdays;
say $dates[ rand($numdays) ] for(1..100_000_000);
running
$ time perl dat | wc -l
100000000
real 0m32.227s
user 0m31.439s
sys 0m1.159s
for 100_000_000
. For 1 milion is 1.2 seconds...
Upvotes: 1