Reputation: 59

line-by-line comparison only returns every second line

so I'm trying to write a script that compares each line of one file with its corresponding line of file 2 (line 1 of file 1 with line 1 of file 2, etc), change the content of file 1 in a specific way, and then return this line. The code I compiled so far goes like this:

#!/usr/bin/perl
# alleles.pl
use strict; use warnings;
use List::MoreUtils qw(uniq);

open my $AL, '<', shift or die $!;
open my $HMP, '<', shift or die $!;

<$AL>; # skip header
print scalar <$HMP>; # print header

while (<$AL>, <$HMP>) {
  my @HMP_columns = split (/\t/, <$HMP>);
  my @AL_columns = split (/\t/, <$AL>);
  shift @AL_columns for 0..4;
  my @uniq_AL_columns = uniq(@AL_columns); # only keep unique values in new array
  for (my $i = 11; $i < scalar(@HMP_columns); $i++) { # for every column (starting from column 12)
    if ($HMP_columns[$i] == "1") {$HMP_columns[$i] = $uniq_AL_columns[0]}
    elsif ($HMP_columns[$i] == "2") {$HMP_columns[$i] = $uniq_AL_columns[1]}
    elsif ($HMP_columns[$i] == "3") {$HMP_columns[$i] = $uniq_AL_columns[2]}
    elsif ($HMP_columns[$i] == "4") {$HMP_columns[$i] = $uniq_AL_columns[3]}
    elsif ($HMP_columns[$i] == "5") {$HMP_columns[$i] = $uniq_AL_columns[4]}
    elsif ($HMP_columns[$i] == "6") {$HMP_columns[$i] = $uniq_AL_columns[5]}
    elsif ($HMP_columns[$i] == "7") {$HMP_columns[$i] = $uniq_AL_columns[6]}
    elsif ($HMP_columns[$i] == "8") {$HMP_columns[$i] = $uniq_AL_columns[7]}
    elsif ($HMP_columns[$i] == "9") {$HMP_columns[$i] = $uniq_AL_columns[8]}
  } 
  my $joined_HMP = join ("\t", @HMP_columns); # get back to tab separated lines
  print "$joined_HMP\n";
}

the formatting of the lines works the way I want it to work, however, the script only seems to work on every 2nd line of the inputs, or at least only return every 2nd line. I tried to initiate the while loop in many different ways and tried a lot of other things like initiating the arrays outside of the while loop, but nothing works. can anybody tell me where I am going wrong in the script? (by the way I know that the style is not pretty, so if any of you have some advise on how to make this script more compact, please feel free to do so! I wanna learn after all.)

Upvotes: 0

Answers (3)

i alarmed alien

Reputation: 9530

A few quick tips:

Whenever you have lines of very similar repeated code, it's a sign there may be a more economical way to do things. For example, this code:

for (my $i = 11; $i < scalar(@HMP_columns); $i++) { # for every column (starting from column 12)
    if ($HMP_columns[$i] == "1") {$HMP_columns[$i] = $uniq_AL_columns[0]}
    elsif ($HMP_columns[$i] == "2") {$HMP_columns[$i] = $uniq_AL_columns[1]}
    elsif ($HMP_columns[$i] == "3") {$HMP_columns[$i] = $uniq_AL_columns[2]}
    elsif ($HMP_columns[$i] == "4") {$HMP_columns[$i] = $uniq_AL_columns[3]}
    elsif ($HMP_columns[$i] == "5") {$HMP_columns[$i] = $uniq_AL_columns[4]}
    elsif ($HMP_columns[$i] == "6") {$HMP_columns[$i] = $uniq_AL_columns[5]}
    elsif ($HMP_columns[$i] == "7") {$HMP_columns[$i] = $uniq_AL_columns[6]}
    elsif ($HMP_columns[$i] == "8") {$HMP_columns[$i] = $uniq_AL_columns[7]}
    elsif ($HMP_columns[$i] == "9") {$HMP_columns[$i] = $uniq_AL_columns[8]}
}

could be refactored - the new value of $HMP_columns[$i] is $uniq_AL_columns[ $HMP_columns[$i-1] ], so the whole block could be rewritten as

for (my $i = 11; $i < scalar(@HMP_columns); $i++) {
    $HMP_columns[$i] = $uniq_AL_columns[$HMP_columns[$i-1]];
}

If you're dealing with arrays, the splice function is extremely handy for doing all sorts of manipulations -- remove elements from an array, replace them with other elements, and so on. shift @AL_columns for 0..4 can be written as splice(@AL_columns, 0, 5); (i.e. remove 5 elements from @AL_columns starting at element 0).

For any perl work, it really helps to write debugging statements and tests while you're developing scripts so that you can be sure that you know what is going on. Sprinkling your code liberally with statements that 'say' the values of variables while the script executes can really aid in the debugging process. Once the code goes into production use, the 'say' statements can be commented out, or you can use an environment or program variable to turn them on and off -- e.g.

say "\$a is $a; \%b is " . Dumper(\%b) if $verbose; ## set a variable 'VERBOSE'

If you haven't discovered Data::Dumper, it will become your best friend during development: it allows you to inspect the contents of data structures and objects and work out why your carefully-written code isn't working as it should!

Perl testing is too big an issue to cover in an SO answer, but have a look at Test and Test::Simple in the perl docs. Writing a test suite while you write your scripts can save a lot of time--and it's much easier than doing it afterwards!

Upvotes: 1

imran

Reputation: 1560

In each loop you are reading from each file twice. Once in the while condition clause and once in the while body.

Replace your while line with:

use IO::Handle;
while (!($HMP->eof() or $AL->eof()) {

This will cause the while condition to fail if either file reaches end. If it is possible for the files to have different lengths, you can check which one did not reach eof after the while loop.

Upvotes: 2

AnFi

Reputation: 10903

You ignore lines fetched in while statement and fetch another lines in split.

Below please find fixed code. It does not detect situation when AL has more lines than HNP.

while (<$HMP>) {
  my @HMP_columns = split (/\t/, $_);
  if( not defined( $_  = <$AL>)) {
     die "HMP longer than AL";
  }
  my @AL_columns = split (/\t/, $_);

Upvotes: 0

line-by-line comparison only returns every second line

Answers (3)

Related Questions