Reputation: 23
I have a file that looks like this:
>sequence_name_16hj51
CAACCTTGGCCAT
>sequence_name_158ghni52
AATTGGCCTTGGA
>sequence_name_468rth
AAGGTTCCA
I would like to obtain this:
['CAACCTTGGCCAT', 'AATTGGCCTTGGA', 'AAGGTTCCA']
I have a list with all the sequence names titled title_finder
. When I try to use:
for i in range(0,len(title_finder)):
seq = seq.split(title_finder[i])
print seq
I get this traceback:
Traceback (most recent call last):
File "D:/Desktop/Python/consensus new.py", line 23, in <module>
seq = seq.split(title_finder[i])
AttributeError: 'list' object has no attribute 'split'
Can somebody help me out?
EDIT: Sometimes some sequences span multiple lines and so I get more than one string when I do it with a for loop.
Upvotes: 1
Views: 1936
Reputation: 5658
line = ""
import re
with open('test') as f:
lines = [line.rstrip() for line in f if not re.search('sequence_name', line)]
print(lines)
['CAACCTTGGCCAT', 'AATTGGCCTTGGA', 'AAGGTTCCA']
Upvotes: 0
Reputation: 107287
You are trying to split a list which gave you that AttributeError
, instead of that you can read your file line and check if the line doesn't starts with >
then preserve it.
With open('file_nam') as f:
my_patterns=[line.rstrip() for line in f in not line.startswith('>')]
Also as an alternative and pythonic way if you are sure that the patterns are in odd lines you can use itertools.islice
to slice your file object :
from itertools import islice
With open('file_nam') as f:
my_my_patterns=list(islice(f,1,None,2))
And note that if you just want to loop over your patterns you don't need to convert the result of islice
to list you can simply iterate over your iterator.
Upvotes: 1
Reputation: 22827
If you're doing bioinformatics, you should really consider installing BioPython.
from Bio import SeqIO
with open('your_file.fasta') as f:
return [str(record.seq) for record in SeqIO.parse(f, "fasta")]
If you want to do it in pure Python, then this wil work:
with open('your_file.fasta') as f:
print [line.rstrip() for line in f if not line.startswith('>')]
Upvotes: 4
Reputation: 13550
assume your file is seq.in, then you can do this to get your list:
In [17]: with open ('seq.in','r') as f:
extracted_list=[line[:-1] for line in f if line[0]!='>']
In [18]: extracted_list
Out[18]: ['CAACCTTGGCCAT', 'AATTGGCCTTGGA', 'AAGGTTCCA']
Upvotes: 0