Using perl to cycle through a list of values (x) and compare to another file with value ranges

Question

I have two files. One file has a list of values like so

NC_SNPStest.txt

250 
275 
375

The other file has space delimited information. Column one is the first value of a range, Column two has the second value of a range, Column 5 has the name of the range, and Column eight has what acts on that range.

promoterstest.txt

20 100 yaaX F yaaX 5147 5.34 Sigma70 99
200 300 yaaA R yaaAp1 6482 6.54 Sigma70 35
350 400 yaaA R yaaAp2 6498 2.86 Sigma70 51

I am trying to write a script that takes the first line from file 1 and then parses file 2 line by line to see if that value falls in the range is between the first two columns.

When the first match is found, I want to print the value from file 1 and then the values in file 2 for columns 5 and 8 from the line with the match. If no match is found in File 2 then just print the value from File 1 and move on.

It seems like it should be a simple enough task but I'm having an issue cycling though both files.

This is what I have written:

#!/usr/bin/perl

use warnings;
use strict;

open my $PromoterFile, '<', 'promoterstest.txt' or die $!;
open my $SNPSFile, '<', 'NC_SNPtest.txt' or die $!; 
open (FILE, ">PromoterMatchtest.txt");

while (my $SNPS = <$SNPSFile>) {
   chomp ($SNPS);

   while (my $Cord = <$PromoterFile>) {
      chomp ($Cord);

      my @CordFile =split(/\s/, $Cord); 
      my $Lend = $CordFile[0];
      my $Rend = $CordFile[1];
      my $Promoter = $CordFile[4];
      my $SigmaFactor = $CordFile[7];

      foreach $a ($SNPS) 
      {
         if ($a >= $Lend && $a <= $Rend)
         {
            print FILE "$a	$CordFile[4]	$CordFile[7]
";
         }
         else 
         { 
            print FILE "$a
";
         }
      }
   }
}
close FILE;
close $PromoterFile;
close $SNPSFile;
exit;

So far my output looks like so:

250
250     yaaAp1  Sigma70
250

Where the first line of file 1 is being called and file 2 is being cycled through. But the else command is being used on each line of file 2 and the script never cycles through the other lines of file 1.

Sobrique · Accepted Answer

Your problem is you're not resetting your progress through the second file. You read one line from $SNPSFile, check that against ever line in the second file.

But when you start over, you're already at the end of file, so:

while (my $Cord = <$PromoterFile>) {

Doesn't have anything to read.

A quick fix for this would be to add a seek command in there, but that'll make inefficient code. I'd suggest instead reading file 1 into a array, and referencing that instead.

Here's a first draft rewrite that may help.

#!/usr/bin/perl

use warnings;
use strict;

use Data::Dumper;

open my $PromoterFile, '<', 'promoterstest.txt'     or die $!;
open my $SNPSFile,     '<', 'NC_SNPtest.txt'        or die $!;
open my $output,       ">", "PromoterMatchtest.txt" or die $!;


my @data;

while (<$PromoterFile>) {
    chomp;
    my @CordFile    = split;
    my $Lend        = $CordFile[0];
    my $Rend        = $CordFile[1];
    my $Promoter    = $CordFile[4];
    my $SigmaFactor = $CordFile[7];

    push(
        @data,
        {   lend        => $CordFile[0],
            rend        => $CordFile[1],
            promoter    => $CordFile[4],
            sigmafactor => $CordFile[7]
        }
    );
}

print Dumper \@data;

foreach my $value (<$SNPSFile>) {
    chomp $value;
    my $found = 0;
    foreach my $element (@data) {
        if (    $value >= $element->{lend}
            and $value <= $element->{rend} )
        {
            #print "Found $value
";
            print {$output} join( "	",
                $value, $element->{promoter}, $element->{sigmafactor} ),
                "
";
            $found++;
            last;
        }

    }
    if ( not $found ) {
        print {$output} $value,"
";
    }
}

close $output;
close $PromoterFile;
close $SNPSFile;

First - we open file2, read in the stuff in it to an array of hashes. (If any of the elements there are unique, we could key off that instead.)

Then we read through SNPSfile one line at a time, looking for each key - printing it if it exists (at least once, on the first hit) and printing just the key if it doesn't.

This generates the output:

250     yaaAp1  Sigma70
275     yaaAp1  Sigma70
375     yaaAp2  Sigma70

Was that what you were aiming for?

Aside from that 'Dumper' statement which outputs the content of @data as thus:

$VAR1 = [
          {
            'sigmafactor' => 'Sigma70',
            'promoter' => 'yaaX',
            'lend' => '20',
            'rend' => '100'
          },
          {
            'sigmafactor' => 'Sigma70',
            'promoter' => 'yaaAp1',
            'rend' => '300',
            'lend' => '200'
          },
          {
            'promoter' => 'yaaAp2',
            'sigmafactor' => 'Sigma70',
            'rend' => '400',
            'lend' => '350'
          }
        ];

Using perl to cycle through a list of values (x) and compare to another file with value ranges

Answers (2)

output

Related Questions