RossCampbell
RossCampbell

Reputation: 103

Perl regex skipping every other line

I am trying to extract the first complete number on each line from a text file like this:

8 gcaggcaaactgcgataataaaaggctgtttcaacagcggagtggattgt 1.5307684822361e-176
11 tttacccagtgagtttgaagcaaggatcttttagtttaccgaaaaatgag 3.22210306380202e-293
14 agcaatagcgcgaacagacaacctcatcagtctaccgcgcaccctttccc 1.32107737963584e-52
20 agtgacagggaaaggcgatcgcggctttacgatcagagatcggtgtcggt 0.942504155078175
30 tccggagactttcgattgcatgcaattcaccatcataccctcttgccctc 0
45 actgagcccctgacgctggccagtgtagcgctgtgaagtcccctctcagg 9.49147409471272e-307
53 gaaccgagcgatcgctgctgccattgtctcgccttctgccgaggaatgcc 2.15850303270505e-28

using the regex in the following code:

my $id = undef;
while (my $line = <INFILE>){
  chomp $line;
  if ($line =~ /\A([0-9]+)/){
      $id = $1;
  }
print OUTFILE "$id\n";
$line = <INFILE>;
chomp $line;
}

The output I'm getting only includes every other line:

8
14
30
53

I've tried printing out every line without doing the match, and everything is there. Once I add the regex, it skips every other line. Any ideas why it's doing this?

Upvotes: 3

Views: 1023

Answers (2)

mathematician1975
mathematician1975

Reputation: 21351

You are skipping file lines

   while (my $line = <INFILE>) {   # Reading line once
       chomp $line;   
       if ($line =~ /\A([0-9]+)/){       
          $id = $1;   
       } 
       print OUTFILE "$id\n";
       $line = <INFILE>;   # Reading line again!!!!!

   }

because you are calling

   $line = <INFILE>; 

twice. You do not need to have the second $line = <INFILE> in your code.

Upvotes: 1

Dancrumb
Dancrumb

Reputation: 27589

You're reading from the INFILE handle twice, once in the while condition, and once at the end of the loop.

Remove, the final read:

my $id = undef;
while (my $line = <INFILE>){
  chomp $line;
  if ($line =~ /\A([0-9]+)/){
      $id = $1;
  }
  print OUTFILE "$id\n";
}

Upvotes: 4

Related Questions