Reputation: 87
I have some successful code that pulls out the range information from a FASTA title if a numerical position (from another file) lies inside it, and prints the regex captures and original position as a result.
File 1 sample data:
7065_8#10 9436 - t
7065_8#10 126477 - c
7065_8#10 413711 + T
File 2 sample data:
>SAEMRSA15_00020 dnaN_DNA_polymerase_III,_beta_chain 2156 3289 forward
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCACCAAGAACAACATTACCT
>SAEMRSA15_00060 gyrA_DNA_gyrase_subunit_A 7005 9674 forward
ATGTCGGAAAAAGAAATTTGGGA
Code:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $outputfile = "/Users/edwardtickle/Documents/CC22CDS.txt";
open FILE1, "/Users/edwardtickle/Documents/CC22indels.tab";
open FILE2, "/Users/edwardtickle/Documents/CC22_CDS_rmmge.aln";
open( OUTPUTFILE, ">$outputfile" );
my @file1list = ();
while (<FILE1>) {
if (/^\S+\s+(\d+)/) {
push @file1list, $1;
}
}
close FILE1;
while (<FILE2>) {
if (/^>(\S+\s+\S+\s+(\d+)\s+(\d+)\s+(\S+))/) {
my $cds1 = $1;
my $cds2 = $2;
my $cds3 = $3;
my $cds4 = $4;
for my $cc22 (@file1list) {
if ( $cc22 > $cds2 && $cc22 < $cds3 ) {
print OUTPUTFILE "$cc22 $cds2 $cds3 $cds4\n";
}
}
}
}
close FILE2;
eg output:
9436 7005 9674 forward
Alongside this captured information, i would like to print the next line of the FASTA file after a match, which includes the sequence data for that gene. I would like the next line to be printed on the same line after the original data. This sounds incredibly simple on paper but i can't work out how to do it! I have attempted to use a previous answer and incorporate it into my code to no avail (shown below). If possible could you please adapt my code instead of suggesting a completely new method for the original correct code, i am trying to make sure i understand every bit of script i use as i go along rather than simply pasting in an answer.
Desired output:
9436 7005 9674 forward ATGTCGGAAAAAGAAATTTGGGA
Incorrect code:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $outputfile = "/Users/edwardtickle/Documents/CC22CDS.txt";
open FILE1, "/Users/edwardtickle/Documents/CC22indels.tab";
open FILE2, "/Users/edwardtickle/Documents/CC22_CDS_rmmge.aln";
open( OUTPUTFILE, ">$outputfile" );
my @file1list = ();
while (<FILE1>) {
if (/^\S+\s+(\d+)/) {
push @file1list, $1;
}
}
my $nextline = 0;
close FILE1;
while ( my $line = <FILE2> ) {
if (/^>(\S+\s+\S+\s+(\d+)\s+(\d+)\s+(\S+))/) {
my $cds1 = $1;
my $cds2 = $2;
my $cds3 = $3;
my $cds4 = $4;
for my $cc22 (@file1list) {
if ( $cc22 > $cds2 && $cc22 < $cds3 ) {
if ($nextline) {
print OUTPUTFILE "$cc22 $cds2 $cds3 $cds4 $nextline\n";
$nextline = ( $line =~ /^>(\S+\s+\S+\s+(\d+)\s+(\d+)\s+(\S+))/ );
}
}
}
}
}
close FILE2;
Thanks in advance!
Upvotes: 0
Views: 108
Reputation: 9520
The problem is that you never give $nextline
a value:
for my $cc22 (@file1list) {
if ( $cc22 > $cds2 && $cc22 < $cds3 ) {
if ($nextline) {
...
}
}
}
There's nowhere in that loop for $nextline
to get set, so the if ($nextline)
statement is never executed. To change that, you need to change your code so $nextline
does get initialised. Since you haven't posted any input data, it's a little difficult to know exactly what should be done, but assuming you get a match on one line and then want to print some details and the sequence from the line after the match, you can edit your code to the following:
my $nextline;
while ( my $line = <FILE2> ) {
if ($line =~ /^>(\S+)\s+\S+\s+(\d+)\s+(\d+)\s+(\S+)/) {
my $cds1 = $1;
my $cds2 = $2;
my $cds3 = $3;
my $cds4 = $4;
# pull in the next line
$nextline = <FILE2>;
for my $cc22 (@file1list) {
if ( $cc22 > $cds2 && $cc22 < $cds3 ) {
# print out the first part of the line without a line break
# and the next line, which already has the line break on it.
print OUTPUTFILE "$cc22 $cds2 $cds3 $cds4 $nextline";
}
}
}
}
using the following input for file 1:
7065_8#10 992 - t
7065_8#10 2264 - c
7065_8#10 413711 + T
Output:
992 517 1878 forward ATGTCGGAAAAAGAAATTTGGGA
2264 2156 3289 forward ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCACCAAGAACAACATTACCT
Upvotes: 2
Reputation: 241848
You start with $nextline = 0
. Then, you only change $nextline in the block that starts with if ($nextline)
. The block is never entered, as $nextline is 0 (false). So, $nextline remains 0 till the end of the script.
Upvotes: 0