user625913
user625913

Reputation: 1

A little help with loops on perl

I am having trouble specifiying the correct algorithm. I am iterating over an input file with loops. The issue that I have is on the last loop.

#!/usr/bin/perl 
# Lab #4
# Judd Bittman

# http://www-users.cselabs.umn.edu/classes/Spring-2011/csci3003/index.php?page=labs 
# this site has what needs to be in the lab
# lab4 is the lab instructions
# yeast protein is the part that is being read

use warnings;
use strict;

my $file = "<YeastProteins.txt";
open(my $proteins, $file);
my @identifier;
my @qualifier;
my @molecularweight;
my @pi;
while (my $line1 = <$proteins>) {
    #print $line1;
    chomp($line1);
    my @line = split(/\t/, $line1);
    push(@identifier,      $line[0]);
    push(@qualifier,       $line[1]);
    push(@molecularweight, $line[2]);
    push(@pi,              $line[3]);
}
my $extreme  = 0;
my $ex_index = 0;
for (my $index = 1; $index < 6805; $index++) {
    if (   defined($identifier[$index])
        && defined($qualifier[$index])
        && defined($molecularweight[$index])
        && defined($pi[$index])) {
# print"$identifier[$index]\t:\t$qualifier[$index]:\t$molecularweight[$index]:\n$pi[$index]";
    }
    if (   defined($identifier[$index])
        && defined($qualifier[$index])
        && defined($pi[$index])) {
        if (abs($pi[$index] - 7) > $extreme && $qualifier[$index] eq "Verified")
        {
            $extreme  = abs($pi[$index] - 7);
            $ex_index = $identifier[$index];
            print $extreme. " " . $ex_index . "\n";
        }
    }
}
print $extreme;
print "\n";
print $ex_index;
print "\n";

# the part above does part b of the assignment
# YLR204W,its part of the Mitochondrial inner membrane protein as well as a processor.
my $exindex = 0;
my $high    = 0;

# two lines above and below is part c
# there is an error and I know there is something wrong
for (my $index = 1; $index < 6805; $index++) {
    if (   defined($qualifier[$index])
        && ($qualifier[$index]) eq "Verified"
        && defined($molecularweight[$index])
        && (abs($molecularweight[$index]) > $high)) {
        $high    = (abs($molecularweight[$index]) > $high);    # something wrong on this line, I know I wrote something wrong
        $exindex = $identifier[$index];
    }
}

print $high;
print "\n";
print $exindex;
print "\n";
close($proteins);
exit;

On the final loop I want my loop to hold on to the protein that is verified and has the highest molecular mass. This is in the input file. What code can I use to tell the program that I want to hold the highest number and its name? I feel like I am very close but I can't put my finger on it.

Upvotes: 0

Views: 230

Answers (3)

Francisco R
Francisco R

Reputation: 4048

Just change:

$high    = (abs($molecularweight[$index]) > $high);

To this:

$high    = abs($molecularweight[$index]) if (abs($molecularweight[$index]) > $high);

At the end of the loop, $high will be the highest value in $molecularweight array.

Upvotes: 1

BadFileMagic
BadFileMagic

Reputation: 701

You likely want a more complex data structure, such as a nested hash. It's hard to give a solid example without more knowledge of the data, but, say your first identifier were abc, the second one was def, etc:

my %protein_entries = (
    abc => {
        qualifier        => 'something',
        molecular_weight => 1234,
        pi               => 'something',
    },
    def => {
        qualifier        => 'something else',
        molecular_weight => 5678,
        pi               => 'something else',
    },
    # …
);

Then, rather than having several different arrays and keeping track of which belongs to which, you get at the elements like so:

Then, if you want to get at the highest by molecular weight, you can sort the identifiers by their molecular weight, then slice off the highest one:

my $highest = (sort {
    $protein_entries{$a}{molecular_weight} 
    <=> 
    $protein_entries{$b}{molecular_weight}
} keys %protein_entries)[1];

You're having problem with your algorithm because you're not structuring your data properly, basically.

In this example, $highest will hold def, then later you can go back and fetch $protein_entries{def}{molecular_weight} or any of the other keys in the anonymous hash referenced by $protein_entries{def}, thus being easily able to recall any relevant associated data.

Upvotes: 1

pdehaan
pdehaan

Reputation: 342

First, a note about perl - in general, it's more common to use foreach style loops rather than c-style indexed loops. For example:

for my $protein (@proteins) {
  #do something with $p
}

(Your situation might require it, I just thought I'd mention this)

To address your specific issue though:

$high = (abs($molecularweight[$index])>$high);

$high is being set to the result of the boolean test being performed. Remove the >$high part (which is being tested in your if statement) and you'll likely end up with what you intended.

Upvotes: 3

Related Questions