Reputation: 13062
I have a csv file that has several columns. Examples,
"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
These are samples line from a huge file I need to parse. I need to select only those lines from this file where the 4th column is within a certain list (say 1000, 2000, .....) and second column between certain dates (say 2011-11-01 00:00:00 to 2011-11-15 00:00:00).
So, how do I do those date selection and only output those line in tab delimited form.
In the example only the second row would be chosen and saved in tab delimited form in another file.
Upvotes: 1
Views: 270
Reputation: 311
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use utf8;
use Carp;
use Date::Parse;
use English qw(-no_match_vars);
our $VERSION = '0.01';
my @list = qw(1000 2000 3000);
#say "@list";
# if ( '1000' ~~ @list ) {
# say 'done';
# }
#s (say 2011-11-01 00:00:00 to 2011-11-15 00:00:00).
my $start_date = str2time('2011-11-01 00:00:00');
my $end_date = str2time('2011-11-15 00:00:00');
#my $input_time = str2time($input_date);
my $RGX_FOUR_FULL = qr{"([^"]+)","([^"]+)","([^"]+)","([^"]+)","([^"]+)"}smo;
my $RGX_DATE_FULL = qr{.*"(\d{4}-\w{2}-\d{2} \d{2}:\d{2}:\d{2})".*}smo;
my @input_data = <DATA>;
my @res =
grep {
extract_time($_) >= $start_date
and extract_time($_) <= $end_date
and ( extract_four($_) ~~ @list )
} @input_data;
print @res;
#say 'Z';
sub extract_time {
my ($search_str) = @_;
$search_str =~ s/$RGX_DATE_FULL/$1/sm;
return str2time($search_str);
}
sub extract_four {
my ($search_str) = @_;
$search_str =~ s/$RGX_FOUR_FULL/$4/sm;
chomp($search_str);
#print $search_str;
return $search_str;
}
__DATA__
"00000089-6d83-486d-9ddf-30bbbf722583","2011-08-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-10 14:52:30","INTNAME","4000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
and you get
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
Upvotes: 0
Reputation: 91428
Using Parse::CSV, here is a way to do the job:
#!/usr/local/bin/perl
use Modern::Perl;
use Parse::CSV;
my $parser = Parse::CSV->new(
file => 'text.csv',
);
while ( my $value = $parser->fetch ) {
if ($value->[3] > 1000 && $value->[3] <= 2000
&& $value->[1] gt '2011-11-01 00:00:00'
&& $value->[1] lt '2011-11-15 00:00:00' ) {
say "$value->[0] --> OK";
}else {
say "$value->[0] --> KO";
}
}
output:
00000089-6d83-486d-9ddf-30bbbf722583 --> KO
000004c9-92c6-4764-b320-b1403276321e --> OK
You can also use the filter capability:
my $parser = Parse::CSV->new(
file => 'text.csv',
filter => sub{
if ($_->[3] > 1000 && $_->[3] <= 2000
&& $_->[1] gt '2011-11-01 00:00:00'
&& $_->[1] lt '2011-11-15 00:00:00' ) {
return $_;
}else {
return undef;
}
}
);
while ( my $value = $parser->fetch ) {
# do what you want with the filtered rows
}
Upvotes: 2
Reputation: 51157
First, that looks like CSV, so you should use Text::CSV_XS (or Text::CSV) to parse it. The "standard" module to use to handle dates/times in Perl is DateTime which goes along with DateTime::Format::ISO8601 or similar, but Date::Parse is also a possibility.
Upvotes: 1
Reputation: 8895
you may want to take a look at Time::Piece
, use it like this (for instance):
# use strftime() formats.
my $time = Time::Piece->strptime($date, "%Y%m%d %H:%M");
(Apply the relevant strftime format for you data)
Upvotes: 1