Reputation: 79
I have two files. One file has a list of values like so
NC_SNPStest.txt
250
275
375
The other file has space delimited information. Column one is the first value of a range, Column two has the second value of a range, Column 5 has the name of the range, and Column eight has what acts on that range.
promoterstest.txt
20 100 yaaX F yaaX 5147 5.34 Sigma70 99
200 300 yaaA R yaaAp1 6482 6.54 Sigma70 35
350 400 yaaA R yaaAp2 6498 2.86 Sigma70 51
I am trying to write a script that takes the first line from file 1 and then parses file 2 line by line to see if that value falls in the range is between the first two columns.
When the first match is found, I want to print the value from file 1 and then the values in file 2 for columns 5 and 8 from the line with the match. If no match is found in File 2 then just print the value from File 1 and move on.
It seems like it should be a simple enough task but I'm having an issue cycling though both files.
This is what I have written:
#!/usr/bin/perl
use warnings;
use strict;
open my $PromoterFile, '<', 'promoterstest.txt' or die $!;
open my $SNPSFile, '<', 'NC_SNPtest.txt' or die $!;
open (FILE, ">PromoterMatchtest.txt");
while (my $SNPS = <$SNPSFile>) {
chomp ($SNPS);
while (my $Cord = <$PromoterFile>) {
chomp ($Cord);
my @CordFile =split(/\s/, $Cord);
my $Lend = $CordFile[0];
my $Rend = $CordFile[1];
my $Promoter = $CordFile[4];
my $SigmaFactor = $CordFile[7];
foreach $a ($SNPS)
{
if ($a >= $Lend && $a <= $Rend)
{
print FILE "$a\t$CordFile[4]\t$CordFile[7]\n";
}
else
{
print FILE "$a\n";
}
}
}
}
close FILE;
close $PromoterFile;
close $SNPSFile;
exit;
So far my output looks like so:
250
250 yaaAp1 Sigma70
250
Where the first line of file 1 is being called and file 2 is being cycled through. But the else
command is being used on each line of file 2 and the script never cycles through the other lines of file 1.
Upvotes: 1
Views: 85
Reputation: 126722
Here's my take on a programming solution. It's important to
Use lexical file handles and the three-paremeter form of open
Keep to lower-case letters, digits and underscores for local variables
I have also used the autodie
pragma to remove the need to test the status of open
explicitly, and the first
function from the core library List::Util
to make the code clearer and more concise
use strict;
use warnings;
use 5.010;
use autodie;
use List::Util 'first';
my @promoters;
{
open my $fh, '<', 'promoterstest.txt';
while ( <$fh> ) {
my @fields = split;
push @promoters, [ @fields[0,1,4,7] ];
}
}
open my $fh, '<', 'NC_SNPStest.txt';
open my $out_fh, '>', 'PromoterMatchtest.txt';
select $out_fh;
while ( <$fh> ) {
my ($num) = split;
my $match = first { $num >= $_->[0] and $num <= $_->[1] } @promoters;
if ( $match ) {
print join("\t", $num, @{$match}[2,3]), "\n";
}
else {
print $num, "\n";
}
}
250 yaaAp1 Sigma70
275 yaaAp1 Sigma70
375 yaaAp2 Sigma70
Upvotes: 2
Reputation: 53478
Your problem is you're not resetting your progress through the second file. You read one line from $SNPSFile
, check that against ever line in the second file.
But when you start over, you're already at the end of file, so:
while (my $Cord = <$PromoterFile>) {
Doesn't have anything to read.
A quick fix for this would be to add a seek
command in there, but that'll make inefficient code. I'd suggest instead reading file 1 into a array, and referencing that instead.
Here's a first draft rewrite that may help.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
open my $PromoterFile, '<', 'promoterstest.txt' or die $!;
open my $SNPSFile, '<', 'NC_SNPtest.txt' or die $!;
open my $output, ">", "PromoterMatchtest.txt" or die $!;
my @data;
while (<$PromoterFile>) {
chomp;
my @CordFile = split;
my $Lend = $CordFile[0];
my $Rend = $CordFile[1];
my $Promoter = $CordFile[4];
my $SigmaFactor = $CordFile[7];
push(
@data,
{ lend => $CordFile[0],
rend => $CordFile[1],
promoter => $CordFile[4],
sigmafactor => $CordFile[7]
}
);
}
print Dumper \@data;
foreach my $value (<$SNPSFile>) {
chomp $value;
my $found = 0;
foreach my $element (@data) {
if ( $value >= $element->{lend}
and $value <= $element->{rend} )
{
#print "Found $value\n";
print {$output} join( "\t",
$value, $element->{promoter}, $element->{sigmafactor} ),
"\n";
$found++;
last;
}
}
if ( not $found ) {
print {$output} $value,"\n";
}
}
close $output;
close $PromoterFile;
close $SNPSFile;
First - we open file2, read in the stuff in it to an array of hashes. (If any of the elements there are unique, we could key off that instead.)
Then we read through SNPSfile one line at a time, looking for each key - printing it if it exists (at least once, on the first hit) and printing just the key if it doesn't.
This generates the output:
250 yaaAp1 Sigma70
275 yaaAp1 Sigma70
375 yaaAp2 Sigma70
Was that what you were aiming for?
Aside from that 'Dumper' statement which outputs the content of @data
as thus:
$VAR1 = [
{
'sigmafactor' => 'Sigma70',
'promoter' => 'yaaX',
'lend' => '20',
'rend' => '100'
},
{
'sigmafactor' => 'Sigma70',
'promoter' => 'yaaAp1',
'rend' => '300',
'lend' => '200'
},
{
'promoter' => 'yaaAp2',
'sigmafactor' => 'Sigma70',
'rend' => '400',
'lend' => '350'
}
];
Upvotes: 3