Split my output into multiple files

Question

I have the following list in a CSV file, and my goal is to split this list into directories named YYYY-Month based on the date in each row.

NAME99;2018/06/13;12:27:30
NAME01;2018/06/13;13:03:59
NAME00;2018/06/15;11:33:01
NAME98;2018/06/15;12:22:00
NAME34;2018/06/15;16:58:45
NAME17;2018/06/18;15:51:10
NAME72;2018/06/19;10:06:37
NAME70;2018/06/19;12:44:03
NAME77;2018/06/19;16:36:55
NAME25;2018/06/11;16:32:57
NAME24;2018/06/11;16:32:57
NAME23;2018/06/11;16:37:15
NAME01;2018/06/11;16:37:15
NAME02;2018/06/11;16:37:15
NAME01;2018/06/11;16:37:18
NAME02;2018/06/05;09:51:17
NAME00;2018/06/13;15:04:29
NAME07;2018/06/19;10:02:26
NAME08;2018/06/26;16:03:57
NAME09;2018/06/26;16:03:57
NAME02;2018/06/27;16:58:12
NAME03;2018/07/03;07:47:21
NAME21;2018/07/03;10:53:00
NAMEXX;2018/07/05;03:13:01
NAME21;2018/07/05;15:39:00
NAME01;2018/07/05;16:00:14
NAME00;2018/07/08;11:50:10
NAME07;2018/07/09;14:46:00

What is the smartest method to achieve this result without having to create a list of static routes, in which to carry out the append?

Currently my program writes this list to a directory called YYYY-Month only on the basis of localtime but does not do anything on each line.

Perl

#!/usr/bin/perl

use strict;
use warnings 'all';
use feature qw(say);

use File::Path qw;
use File::Spec;
use File::Copy;
use POSIX qw;

my $OUTPUT_FILE = 'output.csv';
my $OUTFILE     = 'splitted_output.csv';

# Output to file
open( GL_INPUT, $OUTPUT_FILE ) or die $!;
$/ = "

";    # input record separator

while (  ) {

    chomp;
    my @lines = split /
/;

    my $i = 0;

    foreach my $lines ( @lines ) {

        # Encapsulate Date/Time
        my ( $name, $y, $m, $d, $time ) =
                $lines[$i] =~ /\A(\w+);(\d+)/(\d+)/(\d+);(\d+:\d+:\d+)/;    

        # Generate Directory YYYY-Month - #2009-January
        my $dir = File::Spec->catfile( $BASE_LOG_DIRECTORY, "$y-$m" ) ;
        unless ( -e $dir ) {
            mkpath $dir;
        }

        my $log_file_path = File::Spec->catfile( $dir, $OUTFILE );
        open( OUTPUT, '>>', $log_file_path ) or die $!;

        # Here I append value into files
        print OUTPUT join ';', "$y/$m/$d", $time, "$name
";    
        
        $i++;
    }
}

close( GL_INPUT );
close( OUTPUT );

simbabque · Accepted Answer

There is no reason to care about the actual date, or to use date functions at all here. You want to split up your data based on a partial value of one of the columns in the data. That just happens to be the date.

NAME08;2018/06/26;16:03:57   # This goes to 2018-06/
NAME09;2018/06/26;16:03:57   #
NAME02;2018/06/27;16:58:12   #
NAME03;2018/07/03;07:47:21      # This goes to 2018-07/
NAME21;2018/07/03;10:53:00      #
NAMEXX;2018/07/05;03:13:01      #
NAME21;2018/07/05;15:39:00      #

The easiest way to do this is to iterate your input data, then stick it into a hash with keys for each year-month combination. But you're talking about log files, and they might be large, so that's inefficient.

We should work with different file handles instead.

use strict;
use warnings;

my %months = ( 6 => 'June', 7 => 'July' );

my %handles;
while (my $row = ) {

    # no chomp, we don't actually care about reading the whole row
    my (undef, $dir) = split /;/, $row; # discard name and everything after date

    # create the YYYY-MM key
    $dir =~ s[^(....)/(..)][$1-$months{$2}];

    # open a new handle for this year/month if we don't have it yet
    unless (exists $handles{$dir}) {
        # create the directory (skipped here) ...
        open my $fh, '>', "$dir/filename.csv" or die $!;
        $handles{$dir} = $fh;
    }

    # write out the line to the correct directory
    print { $handles{$dir} } $row;
}

__DATA__
NAME08;2018/06/26;16:03:57
NAME09;2018/06/26;16:03:57
NAME02;2018/06/27;16:58:12
NAME03;2018/07/03;07:47:21
NAME21;2018/07/03;10:53:00
NAMEXX;2018/07/05;03:13:01
NAME21;2018/07/05;15:39:00

I've skipped the part about creating the directory as you already know how to do this.

This code will also work if your rows of data are not sequential. It's not the most efficient as the number of handles will grow the more data you have, but as long you don't have 100s of them at the same time that does not really matter.

Things of note:

You don't need chomp because you don't care about working with the last field.
You don't need to assign all of the values after split because you don't care about them.
You can discard values by assigning them to undef.
Always use three-argument open and lexical file handles.
the {} in print { ... } $roware needed to tell Perl that this is the handle we are printing too. See http://perldoc.perl.org/functions/print.html.

Split my output into multiple files

Perl

Answers (1)

Related Questions