Someone_1313
Someone_1313

Reputation: 432

How to count the numbers of elements in parts of a text file using a loop in Perl?

I´m looking for a way to create a script in Perl to count the elements in my text file and do it in parts. For example, my text file has this form:

ID                       Position   Potential  Jury agreement NGlyc result
(PART 1)
NP_073551.1_HCoV229Egp2   23 NTSY   0.5990     (8/9)           +     
NP_073551.1_HCoV229Egp2   62 NTSS   0.7076     (9/9)           ++        
NP_073551.1_HCoV229Egp2  171 NTTI   0.5743     (5/9)           +     
...
(PART 2)
QJY77946.1_NA             20 NGTN   0.7514     (9/9)           +++   
QJY77946.1_NA             23 NTSH   0.5368     (5/9)           +     
QJY77946.1_NA             51 NFSF   0.7120     (9/9)           ++    
QJY77946.1_NA             62 NTSS   0.6947     (9/9)           ++  
...
(PART 3)
QJY77954.1_NA             20 NGTN   0.7694     (9/9)           +++   
QJY77954.1_NA             23 NTSH   0.5398     (5/9)           +     
QJY77954.1_NA             51 NFSF   0.7121     (9/9)           ++      
...
(PART N°...)

Like you can see the ID is the same in each part (one for PART 1, other to PART 2 and then...). The changes only can see in the columns Position//Potential//Jury agreement//NGlyc result Then, my main goal is to count the line with Potential 0,7 >=.

With this in mind, I´m looking for output like this:

Part 1: 
1 (one value 0.7 >=)
Part 2: 
2 (two values 0.7 >=)
Part 3: 
2 (two values 0.7 >=)
Part N°:
X numbers of values 0.7 >= 

This output tells me the number of positive values (0.7 >=) for each ID.

The pseudocode I believe would be something like this:

foreach ID in LIST
    foreach LINE in FILE
        if (ID is in LINE)
           ... count the line ...
    end foreach LINE
end foreach ID

I´m looking for any suggestion (for a package or script idea) or comment to create a better script.

Thanks! Best!

Upvotes: 0

Views: 178

Answers (1)

wsdookadr
wsdookadr

Reputation: 2662

To count the number of lines, for each part, that match some condition on a certain column, you can just loop over the lines, skip the header, parse the part number, and use an array to count the number of lines matching for each part.

After this you can just loop over the counts recorded in the array and print them out in your specific format.

#!/usr/bin/perl
use strict;
use warnings;

my $part = 0;
my @cnt_part;
while(my $line = <STDIN>) {
    if($. == 1) {
        next;
    }elsif($line =~ m{^\(PART (\d+)\)}) {
        $part = $1;
    }else {
        my @cols = split(m{\s+},$line);
        if(@cols == 6) {
            my $potential = $cols[3];
            if(0.7 <= $potential) {
                $cnt_part[$part]++;
            };
        };
    };
};

for(my $i=1;$i<=$#cnt_part;$i++){
    print "Part $i:\n";
    print "$cnt_part[$i] (values 0.7 <=)\n";
};

To run it, just pipe the entire file through the Perl script:

cat in.txt | perl count.pl

and you get an output like this:

Part 1:
1 (values 0.7 <=)
Part 2:
2 (values 0.7 <=)
Part 3:
2 (values 0.7 <=)

If you want to also display the counts into words, you can use Lingua::EN::Numbers (see this program ) and you get an output very similar to the one in your post:

Part 1:
1 (one values 0.7 <=)
Part 2:
2 (two values 0.7 <=)
Part 3:
2 (two values 0.7 <=)

All the code in this post is also available here.

Upvotes: 2

Related Questions