Reputation: 3
The following is the content of fasta fileA:
>1
PLAARRPRRGKSLAGFESLACSFPVVSRGFLASRSARSLSSEGGTMPDNRQ
PRNRQPRIRSGNEPRSAPAMEPDGRGAWAHSRAALDRLEKLLRCSRCTNIL
REPVCLGGCEHIFCSNCVSDCIGTGCPVCYTPAWIQDLKINRQLDSMIQL
>2
PLWRPAVPDAGRARPVWSRWSAASLWFLKASLLPALRGAFHPKAGRCRIIGS
RGTGSRGSAPGTSLVPRPPWNRMVAVPGPTVAPRSTAWRSCCAARVVLTF*E
SLCV*EDVSTSSVVIV*VTALELDVQCVTPRPGYKT*R*ID
>3
TPPLWRPAVPDAGRAWPVSSRWPAASRWFPEASLLPALRGAFHPKAGRCRII
GSRGTGSRGSAPGTSLVPRPPWNRMVAVPGPTVAPRSTAWRSCCAARVVLTF
now i need to take the fileA as input and find out the mismatches present between 1 and 2 and then 1-3 and also find out the nucleotide change between them. I have written a program so far but it doesn't take fileA as input. kindly help
my problem is i need the fileA to be given as input, and the sequences contains new line character after every 51 nucleotides and my program considers the new line character also for finding out the mismatches.
Program:
$a=<>;$b=<>;
@mul=("$a","$b");
for($i=0;$i<scalar(@mul)-1;$i++) {
$source=$mul[$i];
print "\n\nComparision of source: $mul[$i]\n";
print "------------------------------------";
for($j=$i+1;$j<scalar(@mul);$j++) {
$sample=$mul[$j];
print "\n$sample ";
print "\n------\n";
$t=mutate($source,$sample);
print $t;
}
}
sub mutate {
my ($s1,$s2)=@_;
$temp="";
for($k=0;$k<length($s1);$k++) {
$seq1=substr($s1,$k,1);
$seq2=substr($s2,$k,1);
if($seq1 ne $seq2) {
$temp.="[$seq1($k)/$seq2($k)]";
}
}
return $temp;
}
Upvotes: 0
Views: 2037
Reputation: 753970
You probably want to read paragraphs, which are marked by two newlines in a row. Hence:
use strict;
use warnings;
my(@a);
{
# Limit the scope in which you reset the $/ variable
local($/) = "\n\n";
while (<>)
{
s/\n+//gm; # Remove all newlines
push @a, $_;
}
}
# Now your array contains three items with no newlines - process away...
Upvotes: 1
Reputation: 301
If I understood your problem correctly, then here is what you can do to read the file from command line so that you can get results for different files. Here we read each line and get the source number, then after chomp, append each line to corresponding source. then you can compare any line with any source content.
my $file = $ARGV[0];
open (FILE, $file);
my $file_content;
my $src_indx = 0;
while (my $line = <FILE>){
chomp $line;
$line =~ s/^\s+//;
if ($line =~ /^\>(\d+)/){
$file_content->{$1} = '';
$src_indx = $1;
}else{
$file_content->{$src_indx} .= $line;
}
}
print "\n\nComparision of source: 1 and 2\n";
print "------------------------------------\n";
$t = mutate($file_content->{1},$file_content->{2});
print $t;
sub mutate {
my ($s1,$s2)=@_;
$temp="";
for($k=0;$k<length($s1);$k++) {
$seq1=substr($s1,$k,1);
$seq2=substr($s2,$k,1);
if($seq1 ne $seq2) {
$temp.="[$seq1($k)/$seq2($k)]";
}
}
return $temp;
}
I have not modified your mutate function. If you use regular expression or split instead of substr, you can get better control in mutate too.
Let me know if this is not what you want.
Upvotes: 0