Reputation: 43
I have the following list in a CSV file, and my goal is to split this list into directories named YYYY-Month
based on the date in each row.
NAME99;2018/06/13;12:27:30
NAME01;2018/06/13;13:03:59
NAME00;2018/06/15;11:33:01
NAME98;2018/06/15;12:22:00
NAME34;2018/06/15;16:58:45
NAME17;2018/06/18;15:51:10
NAME72;2018/06/19;10:06:37
NAME70;2018/06/19;12:44:03
NAME77;2018/06/19;16:36:55
NAME25;2018/06/11;16:32:57
NAME24;2018/06/11;16:32:57
NAME23;2018/06/11;16:37:15
NAME01;2018/06/11;16:37:15
NAME02;2018/06/11;16:37:15
NAME01;2018/06/11;16:37:18
NAME02;2018/06/05;09:51:17
NAME00;2018/06/13;15:04:29
NAME07;2018/06/19;10:02:26
NAME08;2018/06/26;16:03:57
NAME09;2018/06/26;16:03:57
NAME02;2018/06/27;16:58:12
NAME03;2018/07/03;07:47:21
NAME21;2018/07/03;10:53:00
NAMEXX;2018/07/05;03:13:01
NAME21;2018/07/05;15:39:00
NAME01;2018/07/05;16:00:14
NAME00;2018/07/08;11:50:10
NAME07;2018/07/09;14:46:00
What is the smartest method to achieve this result without having to create a list of static routes, in which to carry out the append?
Currently my program writes this list to a directory called YYYY-Month
only on the basis of localtime
but does not do anything on each line.
#!/usr/bin/perl
use strict;
use warnings 'all';
use feature qw(say);
use File::Path qw<mkpath>;
use File::Spec;
use File::Copy;
use POSIX qw<strftime>;
my $OUTPUT_FILE = 'output.csv';
my $OUTFILE = 'splitted_output.csv';
# Output to file
open( GL_INPUT, $OUTPUT_FILE ) or die $!;
$/ = "\n\n"; # input record separator
while ( <GL_INPUT> ) {
chomp;
my @lines = split /\n/;
my $i = 0;
foreach my $lines ( @lines ) {
# Encapsulate Date/Time
my ( $name, $y, $m, $d, $time ) =
$lines[$i] =~ /\A(\w+);(\d+)\/(\d+)\/(\d+);(\d+:\d+:\d+)/;
# Generate Directory YYYY-Month - #2009-January
my $dir = File::Spec->catfile( $BASE_LOG_DIRECTORY, "$y-$m" ) ;
unless ( -e $dir ) {
mkpath $dir;
}
my $log_file_path = File::Spec->catfile( $dir, $OUTFILE );
open( OUTPUT, '>>', $log_file_path ) or die $!;
# Here I append value into files
print OUTPUT join ';', "$y/$m/$d", $time, "$name\n";
$i++;
}
}
close( GL_INPUT );
close( OUTPUT );
Upvotes: 1
Views: 99
Reputation: 54373
There is no reason to care about the actual date, or to use date functions at all here. You want to split up your data based on a partial value of one of the columns in the data. That just happens to be the date.
NAME08;2018/06/26;16:03:57 # This goes to 2018-06/ NAME09;2018/06/26;16:03:57 # NAME02;2018/06/27;16:58:12 # NAME03;2018/07/03;07:47:21 # This goes to 2018-07/ NAME21;2018/07/03;10:53:00 # NAMEXX;2018/07/05;03:13:01 # NAME21;2018/07/05;15:39:00 #
The easiest way to do this is to iterate your input data, then stick it into a hash with keys for each year-month combination. But you're talking about log files, and they might be large, so that's inefficient.
We should work with different file handles instead.
use strict;
use warnings;
my %months = ( 6 => 'June', 7 => 'July' );
my %handles;
while (my $row = <DATA>) {
# no chomp, we don't actually care about reading the whole row
my (undef, $dir) = split /;/, $row; # discard name and everything after date
# create the YYYY-MM key
$dir =~ s[^(....)/(..)][$1-$months{$2}];
# open a new handle for this year/month if we don't have it yet
unless (exists $handles{$dir}) {
# create the directory (skipped here) ...
open my $fh, '>', "$dir/filename.csv" or die $!;
$handles{$dir} = $fh;
}
# write out the line to the correct directory
print { $handles{$dir} } $row;
}
__DATA__
NAME08;2018/06/26;16:03:57
NAME09;2018/06/26;16:03:57
NAME02;2018/06/27;16:58:12
NAME03;2018/07/03;07:47:21
NAME21;2018/07/03;10:53:00
NAMEXX;2018/07/05;03:13:01
NAME21;2018/07/05;15:39:00
I've skipped the part about creating the directory as you already know how to do this.
This code will also work if your rows of data are not sequential. It's not the most efficient as the number of handles will grow the more data you have, but as long you don't have 100s of them at the same time that does not really matter.
Things of note:
chomp
because you don't care about working with the last field.split
because you don't care about them. undef
.open
and lexical file handles. {}
in print { ... } $row
are needed to tell Perl that this is the handle we are printing too. See http://perldoc.perl.org/functions/print.html.Upvotes: 3