Mark Clements
Mark Clements

Reputation: 465

How to write each row of a text file into a CSV row

I am new to perl and I am trying to take a file (it's actually an .idx file) that is formatted like this

 Monday       Foo Name             43212    
 Tuesday      Name Foo Foo         43252
 Tuesday      Name                 50322 
 Wednesday    Foo Name             53221
 Thursday     Foo Bar Foo Name     24353

and I want to output it as a csv file. The file should look exactly like this, except in CSV format so that Excel can read it. Also, I only want the rows that are on Tuesday to be included so that the CSV file will look like

 Tuesday      Name Foo Foo     43252
 Tuesday      Name             50322

In Excel. I also have several idx files in the format formYYYY_Q.idx where YYYY refers to the year and Q refers to the quarter. I would like to loop over all the .idx files I have and create one large CSV file with only rows in each .idx file with Tuesday at the beginning. The code I have so far is

 #!/usr/bin/perl

 use warnings;
 use strict;
 use Text::CSV;

 my $csvfile= Text::CSV->new({binary=>1,auto_diag=>1});
 $csvfile->column_names("Day","Name","Number");

 my @datalines;
 my $idxfile="form1994_1.idx";

 open(INFILE, "< $idxfile") or die $!;
 open(my $outfile, "> Master.csv") or die $!;

 while(<INFILE>){

      if(/^Tuesday/){

            chomp($_);
            push(@nsarlines, $_);

     }

 }

When I replace the open command with open(OUTFILE, "> Master.txt") or die $!; and include this in the last line of code outside the while loop:

 print OUTFILE map {"$_ \n"} @nsarlines;

the Master.txt file looks looks like I want it to

 Tuesday      Name Foo Foo     43252
 Tuesday      Name             50322

However, if I use the open command as written above do something like this outside the while loop:

 $csvfile-> print($outfile, \@test);

I get a CSV file with the entire $_ string, which is one row of the .idx file in each cell of the Master.csv file. I'm having difficulty figuring out how to make perl make each .idx row a CSV row WIHTOUT manually inserting commas into $_ (not an elegant or desirable option).

The second thing I need to do is I have files formYYY_Q.idx all in the same directory and I would like to automatically go through each one, pull out the rows that start with Tuesday and add those to the Master.csv file (or rather, do all of that and write the Master.csv file once at the end). I think File::Find might be able to do this, but I haven't been able to figure out how to use it.

Thanks for your help.

Upvotes: 2

Views: 1097

Answers (3)

Kenosis
Kenosis

Reputation: 6204

I applaud the clear description of your problem and your attempted solution!

Given your narrative, including having all of your idx files in one directory, consider the following solution--to be executed in the *.idx-containing directory:

use strict;
use warnings;

open my $outfile, '>', 'Master.csv' or die $!;
print $outfile "Day,Name,Number\n";

for my $idxfile (<*.idx>) {
    next unless $idxfile =~ /^form\d{4}_\d\.idx/;
    open my $infile, '<', $idxfile or die $!;

    while (<$infile>) {
        if (/^Tuesday/) {
            my ( $day, $name, $num ) = /(\w+)\s+(.+?)\s+(\d+)/;
            print $outfile "$day,$name,$num\n";
        }
    }

    close $infile;
}

close $outfile;

The header is first written to the Master.csv file. The <*.idx> construct is a glob that generates a list of *.idx files from the current directory. Next, a regex is used to insure that the filename matches your naming specs. If only those files you want processed are in the directory, you can delete this line.

The current idx file is opened and processed. As in your code, a regex is used to check for "Tuesday" at the beginning of the line. If such a line is encountered, a regex captures the three fields:

/(\w+)\s+(.+?)\s+(\d+)/
   ^   ^  ^    ^   ^
   |   |  |    |   |
   |   |  |    |   + - One+ digits - Number
   |   |  |    + - One+ whitespaces
   |   |  + - One+ any characters (except newline) - Name
   |   + - One+ whitespaces 
   + - One+ 'word' characters - Day

These captured fields--with commas separating them--are written to the Master.csv file. When the current idx file is completely read, it's closed, and the next idx file is processed--if any. Finally, the Master.csv file is closed.

Hope this helps!

Upvotes: 0

jimtut
jimtut

Reputation: 2393

Combined some of the op's code and some of avitevet's, and came up with this:

#!/usr/bin/perl
use warnings;
use strict;

opendir(DIR, ".") or die $!;
my @idxfiles = sort(grep {/\.idx$/} readdir(DIR));
closedir(DIR);

open(OUT, "> Master.csv") or die $!;
foreach my $idx (@idxfiles) {
  open(F, "$idx") or die $!;
  while (<F>) {
    if (m/^Tuesday/) {
      my @fields = split(/\s+/);
      my $day = shift(@fields); # grab the first one
      my $zip = pop(@fields); # grab the last one;
      my $middle = join(" ", @fields); # merge the rest back together
      print OUT "$day,$middle,$zip\n";
    }
  }
  close(F)
}
close(OUT);

Upvotes: 1

Avi Tevet
Avi Tevet

Reputation: 828

There are 2 ways to use File::Find. One is you use the wanted function to add data about the files to a global list/queue/variable, then process the data after the find call. The other way is to perform all the processing in the wanted function.

I personally don't like using globals to pass data out of functions but unfortunately with File::Find that is your option. Here's an example where they do that: http://www.perlmonks.org/?node_id=217378. In the example, they use the %size hash to pass data out of the anonymous wanted function. This is probably your best option, you would add the matching filenames to a global list, then iterate through the list, writing data from each file to your CSV.

The other option is to perform your processing in the wanted function. But again this is suboptimal, because you will need to use globals to pass information into the wanted function about the open CSV file.

Another option, assuming that all your .idx files can be guaranteed to be in the same directory (rather than in the same dir tree) is to use the opendir & readdir functions. http://perldoc.perl.org/functions/readdir.html

my $dh;   # directory handle
opendir($dh, $your_dir) || die $!;
my @idxfiles = grep {/\.idx$/} readdir($dh);
closedir($dh);

foreach my $idxfile (@idxfiles) {
   open(INFILE, "< $idxfile") or die $!;
   ... the rest of your code ...
}

Upvotes: 2

Related Questions