Mariya
Mariya

Reputation: 847

perl hash array reading from files

I'm trying to read multiple files that have the same format and want to make some statistics based on regex.

i.e I want to count similar items that are within the []

 NC_013618 NC_013633 ([T(nad6 trnE ,cob trnT ,)])
C_013481 NC_013479 ([T(trnP ,rrnS trnF trnV rrnL nad1 trnI ,)])
NC_013485 NC_003159 ([T(trnC ,trnY ,)])
NC_013554 NC_013254 ([T(trnR ,trnN ,)])
NC_013607 NC_013618 ([T(nad6 trnE ,cob trnT ,)])

the problem is that i'm not getting right values, below is my code:

 use strict;
 use warnings;

my %data;
@FILES = glob("../mitos-crex/*.out");
foreach my $file (@FILES) {
    local $/ = undef;
    open my $fh, '<', $file;
    $data{$file} = <$fh>;
}

my @t;
my $c = 0;
foreach my $line (keys %data) {
    foreach my $l ($data{$line}) {
         print $l."\n";
        ($t[$c]) = $l =~ m/(\[.*\])/;

        $c++;
    }
}

#the problem is here the counter is not giving the right value

print $c;
my %counts;
$counts{$_}++ for @t;

thanks in advance

Upvotes: 0

Views: 360

Answers (2)

Borodin
Borodin

Reputation: 126722

First of all, always use strict and use warnings. This measure is vital for all programming, as it will quickly reveal simple problems that you may otherwise overlook or waste time on debugging. This is especially true and a simple courtesy if you are asking for others' help with your program

You seem to have become confused between slurping an entire file into a single string, and into an array of lines. The way you have written it, each element $data{file} is a single scalar value containing all of the file's data, and then you try to iterate over it with foreach $l ($data{$line}) { ... } which executes just once and so only find the first [...] string in the file

Ordinarily I would say that you shouldn't read in all of your file data in this way, as the problem is likely to have a better streamed solution, but I don't know what else you want to use the captured data for, so my solution follows your own design

I think you need to slurp the data into a virtual array, instead of a scalar, and then iterate over that in your loops. You must leave $/ defined so that the file is read in lines, and build an anonymous array with [ <$fh> ]. Then you can iterate over the lines with foreach my $line (@{ $data{$file} }) { ... }

use strict;
use warnings;

my %data;

my @files = glob("../mitos-crex/*.out");

foreach my $file (@files) {
    open my $fh, '<', $file or die $!;
    $data{$file} = [ <$fh> ];
}

my $c = 0;
my @t;
foreach my $file (keys %data) {
    foreach my $line (@{ $data{$file} }) {
        ($t[$c]) = $line =~ /(\[.*\])/;
        $c++;
    }
}

print $c;
my %counts;
$counts{$_}++ for @t;

Upvotes: 3

TLP
TLP

Reputation: 67900

The counter is giving a correct value. Your problem is that you are slurping the file (reading it all in at once), but then only storing the first value found:

($t[$c]) = $data{$line} =~ m/(\[.*\])/;  # only finds first value in file

Either loop over each file properly, and use the above regex for each line, or do something like:

push @t, ($data{$line} =~ m/(\[.*\])/g);

You should always use

use strict;
use warnings;

And solve the errors/warnings that result. Not doing so is a bad idea, and is only hiding the problems in your code -- not solving them.

Also, you should be aware that this statement:

foreach $l ($data{$line}) {

Only iterates once, because each "line" here is an entire file, and $data{$line} is besides a scalar value. Moreover, you iterate using $l as an alias, but you still use $data{$line} inside the loop, which makes the loop completely redundant.

Upvotes: 0

Related Questions