Reputation: 2304
I found that if my fasta file ends with a single line sequence then that sequence returned by Bioperl will have one nucleotide missing. If fasta file ends with the new line then it returns complete sequence. Don't understand why? Is this a requirement for fasta files to end with an empty new line?
This the code I am using
my $obj = $db->get_Seq_by_id($id);
my $seq = $obj->seq; # returns 36 or 35 nucleotides depending if last new line exists
my $length = $obj->length; # returns 36 or 35
And the fasta sequence:
gi|37423|emb|X04588.1| Human 2.5 kb mRNA for cytoskeletal tropomyosin TM30(nm) CCCTTTAAATTTCCCTTTAAATTTCCCTTTAAATTTT
Upvotes: 1
Views: 515
Reputation: 54392
You should check that your fasta file has an even number of lines: wc -l file.fasta
.
It is a requirement that for each line in your fasta file, there must be an end of current line character: $
. If you use the vi
editor, type :set list
to reveal these hidden characters. Alternatively, try: cat -A file.fasta
to see the line endings.
Also, to be a true fasta file, your header line should begin with the >
character.
Perhaps it's not so much the evenness of lines, but rather if the last line in the file contains a newline ending. If this:
cat -A fasta.file | awk 'END { print substr ($0, length, 1) }'
does not return a dollar sign ($
), then you may have problems using your fasta file.
To replicate the problem, you can remove the last newline character from a 'good' (even lined) fasta file with this:
perl -i -pe 'chomp if eof' fasta.file
And you can add a newline to the end of your file with this:
perl -i -ne 'chomp; print "$_\n"' fasta.file
Upvotes: 1