PlutonicFriend
PlutonicFriend

Reputation: 79

Using perl to cycle through a list of values (x) and compare to another file with value ranges

I have two files. One file has a list of values like so

NC_SNPStest.txt

250 
275 
375

The other file has space delimited information. Column one is the first value of a range, Column two has the second value of a range, Column 5 has the name of the range, and Column eight has what acts on that range.

promoterstest.txt

20 100 yaaX F yaaX 5147 5.34 Sigma70 99
200 300 yaaA R yaaAp1 6482 6.54 Sigma70 35
350 400 yaaA R yaaAp2 6498 2.86 Sigma70 51

I am trying to write a script that takes the first line from file 1 and then parses file 2 line by line to see if that value falls in the range is between the first two columns.

When the first match is found, I want to print the value from file 1 and then the values in file 2 for columns 5 and 8 from the line with the match. If no match is found in File 2 then just print the value from File 1 and move on.

It seems like it should be a simple enough task but I'm having an issue cycling though both files.

This is what I have written:

#!/usr/bin/perl

use warnings;
use strict;

open my $PromoterFile, '<', 'promoterstest.txt' or die $!;
open my $SNPSFile, '<', 'NC_SNPtest.txt' or die $!; 
open (FILE, ">PromoterMatchtest.txt");

while (my $SNPS = <$SNPSFile>) {
   chomp ($SNPS);

   while (my $Cord = <$PromoterFile>) {
      chomp ($Cord);

      my @CordFile =split(/\s/, $Cord); 
      my $Lend = $CordFile[0];
      my $Rend = $CordFile[1];
      my $Promoter = $CordFile[4];
      my $SigmaFactor = $CordFile[7];

      foreach $a ($SNPS) 
      {
         if ($a >= $Lend && $a <= $Rend)
         {
            print FILE "$a\t$CordFile[4]\t$CordFile[7]\n";
         }
         else 
         { 
            print FILE "$a\n";
         }
      }
   }
}
close FILE;
close $PromoterFile;
close $SNPSFile;
exit;

So far my output looks like so:

250
250     yaaAp1  Sigma70
250

Where the first line of file 1 is being called and file 2 is being cycled through. But the else command is being used on each line of file 2 and the script never cycles through the other lines of file 1.

Upvotes: 1

Views: 85

Answers (2)

Borodin
Borodin

Reputation: 126722

Here's my take on a programming solution. It's important to

  • Use lexical file handles and the three-paremeter form of open

  • Keep to lower-case letters, digits and underscores for local variables

I have also used the autodie pragma to remove the need to test the status of open explicitly, and the first function from the core library List::Util to make the code clearer and more concise

use strict;
use warnings;
use 5.010;
use autodie;

use List::Util 'first';

my @promoters;
{
    open my $fh, '<', 'promoterstest.txt';
    while ( <$fh> ) {
        my @fields = split;
        push @promoters, [ @fields[0,1,4,7] ];
    }
}

open my $fh,     '<', 'NC_SNPStest.txt';
open my $out_fh, '>', 'PromoterMatchtest.txt';
select $out_fh;

while ( <$fh> ) {
    my ($num) = split;
    my $match = first { $num >= $_->[0] and $num <= $_->[1] } @promoters;
    if ( $match ) {
        print join("\t", $num, @{$match}[2,3]), "\n";
    }
    else {
        print $num, "\n";
    }
}

output

250 yaaAp1  Sigma70
275 yaaAp1  Sigma70
375 yaaAp2  Sigma70

Upvotes: 2

Sobrique
Sobrique

Reputation: 53478

Your problem is you're not resetting your progress through the second file. You read one line from $SNPSFile, check that against ever line in the second file.

But when you start over, you're already at the end of file, so:

while (my $Cord = <$PromoterFile>) {

Doesn't have anything to read.

A quick fix for this would be to add a seek command in there, but that'll make inefficient code. I'd suggest instead reading file 1 into a array, and referencing that instead.

Here's a first draft rewrite that may help.

#!/usr/bin/perl

use warnings;
use strict;

use Data::Dumper;

open my $PromoterFile, '<', 'promoterstest.txt'     or die $!;
open my $SNPSFile,     '<', 'NC_SNPtest.txt'        or die $!;
open my $output,       ">", "PromoterMatchtest.txt" or die $!;


my @data;

while (<$PromoterFile>) {
    chomp;
    my @CordFile    = split;
    my $Lend        = $CordFile[0];
    my $Rend        = $CordFile[1];
    my $Promoter    = $CordFile[4];
    my $SigmaFactor = $CordFile[7];

    push(
        @data,
        {   lend        => $CordFile[0],
            rend        => $CordFile[1],
            promoter    => $CordFile[4],
            sigmafactor => $CordFile[7]
        }
    );
}

print Dumper \@data;

foreach my $value (<$SNPSFile>) {
    chomp $value;
    my $found = 0;
    foreach my $element (@data) {
        if (    $value >= $element->{lend}
            and $value <= $element->{rend} )
        {
            #print "Found $value\n";
            print {$output} join( "\t",
                $value, $element->{promoter}, $element->{sigmafactor} ),
                "\n";
            $found++;
            last;
        }

    }
    if ( not $found ) {
        print {$output} $value,"\n";
    }
}

close $output;
close $PromoterFile;
close $SNPSFile;

First - we open file2, read in the stuff in it to an array of hashes. (If any of the elements there are unique, we could key off that instead.)

Then we read through SNPSfile one line at a time, looking for each key - printing it if it exists (at least once, on the first hit) and printing just the key if it doesn't.

This generates the output:

250     yaaAp1  Sigma70
275     yaaAp1  Sigma70
375     yaaAp2  Sigma70

Was that what you were aiming for?

Aside from that 'Dumper' statement which outputs the content of @data as thus:

$VAR1 = [
          {
            'sigmafactor' => 'Sigma70',
            'promoter' => 'yaaX',
            'lend' => '20',
            'rend' => '100'
          },
          {
            'sigmafactor' => 'Sigma70',
            'promoter' => 'yaaAp1',
            'rend' => '300',
            'lend' => '200'
          },
          {
            'promoter' => 'yaaAp2',
            'sigmafactor' => 'Sigma70',
            'rend' => '400',
            'lend' => '350'
          }
        ];

Upvotes: 3

Related Questions