Reputation: 465
I am new to perl and I am trying to take a file (it's actually an .idx
file) that is formatted like this
Monday Foo Name 43212
Tuesday Name Foo Foo 43252
Tuesday Name 50322
Wednesday Foo Name 53221
Thursday Foo Bar Foo Name 24353
and I want to output it as a csv file. The file should look exactly like this, except in CSV format so that Excel can read it. Also, I only want the rows that are on Tuesday to be included so that the CSV file will look like
Tuesday Name Foo Foo 43252
Tuesday Name 50322
In Excel. I also have several idx
files in the format formYYYY_Q.idx
where YYYY
refers to the year and Q
refers to the quarter. I would like to loop over all the .idx
files I have and create one large CSV file with only rows in each .idx
file with Tuesday
at the beginning. The code I have so far is
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csvfile= Text::CSV->new({binary=>1,auto_diag=>1});
$csvfile->column_names("Day","Name","Number");
my @datalines;
my $idxfile="form1994_1.idx";
open(INFILE, "< $idxfile") or die $!;
open(my $outfile, "> Master.csv") or die $!;
while(<INFILE>){
if(/^Tuesday/){
chomp($_);
push(@nsarlines, $_);
}
}
When I replace the open command with open(OUTFILE, "> Master.txt") or die $!;
and include this in the last line of code outside the while loop:
print OUTFILE map {"$_ \n"} @nsarlines;
the Master.txt file looks looks like I want it to
Tuesday Name Foo Foo 43252
Tuesday Name 50322
However, if I use the open command as written above do something like this outside the while loop:
$csvfile-> print($outfile, \@test);
I get a CSV file with the entire $_
string, which is one row of the .idx
file in each cell of the Master.csv file. I'm having difficulty figuring out how to make perl make each .idx
row a CSV row WIHTOUT manually inserting commas into $_
(not an elegant or desirable option).
The second thing I need to do is I have files formYYY_Q.idx
all in the same directory and I would like to automatically go through each one, pull out the rows that start with Tuesday and add those to the Master.csv file (or rather, do all of that and write the Master.csv file once at the end). I think File::Find
might be able to do this, but I haven't been able to figure out how to use it.
Thanks for your help.
Upvotes: 2
Views: 1097
Reputation: 6204
I applaud the clear description of your problem and your attempted solution!
Given your narrative, including having all of your idx files in one directory, consider the following solution--to be executed in the *.idx-containing directory:
use strict;
use warnings;
open my $outfile, '>', 'Master.csv' or die $!;
print $outfile "Day,Name,Number\n";
for my $idxfile (<*.idx>) {
next unless $idxfile =~ /^form\d{4}_\d\.idx/;
open my $infile, '<', $idxfile or die $!;
while (<$infile>) {
if (/^Tuesday/) {
my ( $day, $name, $num ) = /(\w+)\s+(.+?)\s+(\d+)/;
print $outfile "$day,$name,$num\n";
}
}
close $infile;
}
close $outfile;
The header is first written to the Master.csv file. The <*.idx>
construct is a glob that generates a list of *.idx files from the current directory. Next, a regex is used to insure that the filename matches your naming specs. If only those files you want processed are in the directory, you can delete this line.
The current idx file is opened and processed. As in your code, a regex is used to check for "Tuesday" at the beginning of the line. If such a line is encountered, a regex captures the three fields:
/(\w+)\s+(.+?)\s+(\d+)/
^ ^ ^ ^ ^
| | | | |
| | | | + - One+ digits - Number
| | | + - One+ whitespaces
| | + - One+ any characters (except newline) - Name
| + - One+ whitespaces
+ - One+ 'word' characters - Day
These captured fields--with commas separating them--are written to the Master.csv file. When the current idx file is completely read, it's closed, and the next idx file is processed--if any. Finally, the Master.csv file is closed.
Hope this helps!
Upvotes: 0
Reputation: 2393
Combined some of the op's code and some of avitevet's, and came up with this:
#!/usr/bin/perl
use warnings;
use strict;
opendir(DIR, ".") or die $!;
my @idxfiles = sort(grep {/\.idx$/} readdir(DIR));
closedir(DIR);
open(OUT, "> Master.csv") or die $!;
foreach my $idx (@idxfiles) {
open(F, "$idx") or die $!;
while (<F>) {
if (m/^Tuesday/) {
my @fields = split(/\s+/);
my $day = shift(@fields); # grab the first one
my $zip = pop(@fields); # grab the last one;
my $middle = join(" ", @fields); # merge the rest back together
print OUT "$day,$middle,$zip\n";
}
}
close(F)
}
close(OUT);
Upvotes: 1
Reputation: 828
There are 2 ways to use File::Find. One is you use the wanted
function to add data about the files to a global list/queue/variable, then process the data after the find
call. The other way is to perform all the processing in the wanted
function.
I personally don't like using globals to pass data out of functions but unfortunately with File::Find that is your option. Here's an example where they do that: http://www.perlmonks.org/?node_id=217378. In the example, they use the %size hash to pass data out of the anonymous wanted
function. This is probably your best option, you would add the matching filenames to a global list, then iterate through the list, writing data from each file to your CSV.
The other option is to perform your processing in the wanted
function. But again this is suboptimal, because you will need to use globals to pass information into the wanted
function about the open CSV file.
Another option, assuming that all your .idx files can be guaranteed to be in the same directory (rather than in the same dir tree) is to use the opendir & readdir functions. http://perldoc.perl.org/functions/readdir.html
my $dh; # directory handle
opendir($dh, $your_dir) || die $!;
my @idxfiles = grep {/\.idx$/} readdir($dh);
closedir($dh);
foreach my $idxfile (@idxfiles) {
open(INFILE, "< $idxfile") or die $!;
... the rest of your code ...
}
Upvotes: 2