Reputation: 847
I'm trying to read multiple files that have the same format and want to make some statistics based on regex.
i.e I want to count similar items that are within the []
NC_013618 NC_013633 ([T(nad6 trnE ,cob trnT ,)])
C_013481 NC_013479 ([T(trnP ,rrnS trnF trnV rrnL nad1 trnI ,)])
NC_013485 NC_003159 ([T(trnC ,trnY ,)])
NC_013554 NC_013254 ([T(trnR ,trnN ,)])
NC_013607 NC_013618 ([T(nad6 trnE ,cob trnT ,)])
the problem is that i'm not getting right values, below is my code:
use strict;
use warnings;
my %data;
@FILES = glob("../mitos-crex/*.out");
foreach my $file (@FILES) {
local $/ = undef;
open my $fh, '<', $file;
$data{$file} = <$fh>;
}
my @t;
my $c = 0;
foreach my $line (keys %data) {
foreach my $l ($data{$line}) {
print $l."\n";
($t[$c]) = $l =~ m/(\[.*\])/;
$c++;
}
}
#the problem is here the counter is not giving the right value
print $c;
my %counts;
$counts{$_}++ for @t;
thanks in advance
Upvotes: 0
Views: 360
Reputation: 126722
First of all, always use strict
and use warnings
. This measure is vital for all programming, as it will quickly reveal simple problems that you may otherwise overlook or waste time on debugging. This is especially true and a simple courtesy if you are asking for others' help with your program
You seem to have become confused between slurping an entire file into a single string, and into an array of lines. The way you have written it, each element $data{file}
is a single scalar value containing all of the file's data, and then you try to iterate over it with foreach $l ($data{$line}) { ... }
which executes just once and so only find the first [...]
string in the file
Ordinarily I would say that you shouldn't read in all of your file data in this way, as the problem is likely to have a better streamed solution, but I don't know what else you want to use the captured data for, so my solution follows your own design
I think you need to slurp the data into a virtual array, instead of a scalar, and then iterate over that in your loops. You must leave $/
defined so that the file is read in lines, and build an anonymous array with [ <$fh> ]
. Then you can iterate over the lines with foreach my $line (@{ $data{$file} }) { ... }
use strict;
use warnings;
my %data;
my @files = glob("../mitos-crex/*.out");
foreach my $file (@files) {
open my $fh, '<', $file or die $!;
$data{$file} = [ <$fh> ];
}
my $c = 0;
my @t;
foreach my $file (keys %data) {
foreach my $line (@{ $data{$file} }) {
($t[$c]) = $line =~ /(\[.*\])/;
$c++;
}
}
print $c;
my %counts;
$counts{$_}++ for @t;
Upvotes: 3
Reputation: 67900
The counter is giving a correct value. Your problem is that you are slurping the file (reading it all in at once), but then only storing the first value found:
($t[$c]) = $data{$line} =~ m/(\[.*\])/; # only finds first value in file
Either loop over each file properly, and use the above regex for each line, or do something like:
push @t, ($data{$line} =~ m/(\[.*\])/g);
You should always use
use strict;
use warnings;
And solve the errors/warnings that result. Not doing so is a bad idea, and is only hiding the problems in your code -- not solving them.
Also, you should be aware that this statement:
foreach $l ($data{$line}) {
Only iterates once, because each "line" here is an entire file, and $data{$line}
is besides a scalar value. Moreover, you iterate using $l
as an alias, but you still use $data{$line}
inside the loop, which makes the loop completely redundant.
Upvotes: 0