Reputation: 41
I am writing a perl script to parse tab delimited data from standard input.
The script removes leading and trailing whitespace, blanks out any field that has the string "NULL" and re-formats date columns from "MMM DD YYYY HH:MM:SS:SSSAM" format to "YYYYMMDD" format.
Sample Input:
93092 Apr 1 2010 12:00:00:000AM 59668370.60702875
22341 Apr 1 2010 12:00:00:000AM 51309196.84639429
27844 Apr 1 2010 12:00:00:000AM NULL
150465 Apr 22 2010 12:00:00:000AM 19706190.97586569
119364 Jul 20 2010 12:00:00:000AM 16335977.41009162
Target Output:
93092|20100401|59668370.60702875
22341|20100401|51309196.84639429
27844|20100401|
150465|20100422|19706190.97586569
119364|20100720|16335977.41009162
The script takes an argument representing the column(s) which have dates that need conversion. In the sample above, I would invoke with "1" as the param, since the 2nd column is the date that needs conversion. More than one column would be represented by a comma separated list.
This is what I've been able to do so far.
#!/usr/bin/perl
my @date_cols = split(/,/, $ARGV[0]);
while (<STDIN>) {
my @fields = split(/\t/, $_, -1);
for (@fields) {
s/^\s+//;
s/\s+\z//;
s/^NULL\z//;
}
for (@fields[@date_cols]) {
##NEED HELP WITH DATE FORMATTING
}
print(join('|', @fields), "\n");
}
Upvotes: 2
Views: 945
Reputation: 7516
Using Time::Piece is simple and easily provides you with the date formmating. The strptime
function lets you define the pattern you want to work with; the strftime
function lets you produce the output format desired. Consider:
use Time::Piece;
my $date = "Apr 1 2012 12:00:00AM";
my $t = Time::Piece->strptime($date,"%b %d %Y %H:%M:%S%p");
print $t->strftime("%Y%m%d\n");
A nice feature of this approach is that it doesn't matter whether one or two spaces separate the month and day fields; the results are the same.
Upvotes: 1
Reputation: 3995
Based on Dave Cross's suggestion above to use Time::Piece:
use Time::Piece;
while (<STDIN>) {
# Split each row into columns by white space
my @fields = split /\s+/;
# Rebuild the date ("Apr 1 2010") from columns 2 through 4
my $time_field = join ' ', @fields[1..3];
# Parse the date - see man strptime
my $date = Time::Piece->strptime($time_field, '%B %d %Y');
# Format the output - see man strftime
print join '|', $fields[0], $date->strftime('%Y%m%d'), $fields[5];
}
Regex is a really excellent tool, but dates are ugly (horrifying even). Whenever possible I prefer to use libraries that already exist for parsing them.
Upvotes: 0